Multistream sparse representation features for noise robust audio-visual speech recognition

Peng Shen; Satoru Hayamizu; Satoshi Tamura

首页> 外文期刊>Acoustical science and technology >Multistream sparse representation features for noise robust audio-visual speech recognition

【24h】

Multistream sparse representation features for noise robust audio-visual speech recognition

机译：多流稀疏表示功能可实现强大的抗噪视听语音识别

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

References(22) In this paper, we propose to use exemplar-based sparse representation features for noise robust audio-visual speech recognition. First, we introduce a sparse representation technology and describe how noise robustness can be realized by the sparse representation for noise reduction. Then, feature fusion methods are proposed to combine audio-visual features with the sparse representation. Our work provides new insight into two crucial issues in automatic speech recognition: noise reduction and robust audio-visual features. For noise reduction, we describe a noise reduction method in which speech and noise are mapped into different subspaces by the sparse representation to reduce the noise. Our proposed method can be deployed not only on audio noise reduction but also on visual noise reduction for several types of noise. For the second issue, we investigate two feature fusion methods –- late feature fusion and the joint sparsity model method –- to calculate audio-visual sparse representation features to improve the accuracy of the audio-visual speech recognition. Our proposed method can also contribute to feature fusion for the audio-visual speech recognition system. Finally, to evaluate the new sparse representation features, a database for audio-visual speech recognition is used in this research. We show the effectiveness of our proposed noise reduction on both audio and visual cases for several types of noise and the effectiveness of audio-visual feature determination by the joint sparsity model, in comparison with the late feature fusion method and traditional methods.

机译：参考文献（22）本文提出将基于样本的稀疏表示特征用于噪声鲁棒的视听语音识别。首先，我们介绍一种稀疏表示技术，并介绍如何通过稀疏表示实现降噪的鲁棒性。然后，提出了特征融合方法，将视听特征与稀疏表示相结合。我们的工作为自动语音识别中的两个关键问题提供了新的见解：降噪和强大的视听功能。对于降噪，我们描述了一种降噪方法，其中通过稀疏表示将语音和噪声映射到不同的子空间中以降低噪声。我们提出的方法不仅可以用于音频降噪，而且还可以用于多种类型的噪声的视觉降噪。对于第二个问题，我们研究了两种特征融合方法-后特征融合和联合稀疏模型方法-计算视听稀疏表示特征以提高视听语音识别的准确性。我们提出的方法还可以为视听语音识别系统的特征融合做出贡献。最后，为了评估新的稀疏表示特征，本研究使用了用于视听语音识别的数据库。与最新的特征融合方法和传统方法相比，我们展示了针对多种类型的噪声在音频和视频情况下建议的降噪效果，以及通过联合稀疏模型确定视听特征的效果。

著录项

来源
《Acoustical science and technology》 |2014年第1期|共11页
作者
Peng Shen; Satoru Hayamizu; Satoshi Tamura;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类声学;
关键词

相似文献

外文文献
中文文献
专利

1. Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition [J] . Gemmeke J. F., Virtanen T., Hurmalainen A. Audio, Speech, and Language Processing, IEEE Transactions on . 2011,第7期

机译：基于示例的稀疏表示，用于噪声鲁棒的自动语音识别
2. Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition [J] . Fazel A., Chakrabartty S. Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第4期

机译：用于噪声鲁棒语音识别的稀疏听觉再现内核（SPARK）功能
3. Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques [J] . Eslam E. El Maghraby, Amr M. Gody, Mohamed Hesham Farouk International Journal of Advanced Computer Research . 2020,第47期

机译：利用不同深度学习分类技术，使用多模式视听方法的噪声强大语音识别系统
4. Feature reconstruction using sparse imputation for noise robust audio-visual speech recognition [C] . Shen Peng, Tamura Satoshi, Hayamizu Satoru 2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. . 2012

机译：使用稀疏插值进行特征重建以增强噪声的视听语音识别
5. A Practical and Efficient Multistream Framework for Noise Robust Speech Recognition [D] . Mallidi, Sri Harish. 2018

机译：实用高效的多流噪声鲁棒语音识别框架
6. A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：在带通滤波调制多流功能根据框架鲁棒语音识别
7. Multistream sparse representation features for noise robust audio-visual speech recognition [O] . Peng Shen, Satoshi Tamura, Satoru Hayamizu 2014

机译：MultiStream稀疏表示功能，用于噪声强大的视听语音语音识别

Multistream sparse representation features for noise robust audio-visual speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅