Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition

Yanmin Qian; Mengxiao Bi; Tian Tan; Kai Yu

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition

【24h】

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition

机译：用于噪声鲁棒语音识别的超深度卷积神经网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Although great progress has been made in automatic speech recognition, significant performance degradation still exists in noisy environments. Recently, very deep convolutional neural networks (CNNs) have been successfully applied to computer vision and speech recognition tasks. Based on our previous work on very deep CNNs, in this paper this architecture is further developed to improve recognition accuracy for noise robust speech recognition. In the proposed very deep CNN architecture, we study the best configuration for the sizes of filters, pooling, and input feature maps: the sizes of filters and poolings are reduced and dimensions of input features are extended to allow for adding more convolutional layers. Then the appropriate pooling, padding, and input feature map selection strategies are investigated and applied to the very deep CNN to make it more robust for speech recognition. In addition, an in-depth analysis of the architecture reveals key characteristics, such as compact model scale, fast convergence speed, and noise robustness. The proposed new model is evaluated on two tasks: Aurora4 task with multiple additive noise types and channel mismatch, and the AMI meeting transcription task with significant reverberation. Experiments on both tasks show that the proposed very deep CNNs can significantly reduce word error rate (WER) for noise robust speech recognition. The best architecture obtains a 10.0% relative reduction over the traditional CNN on AMI, competitive with the long short-term memory recurrent neural networks (LSTM-RNN) acoustic model. On Aurora4, even without feature enhancement, model adaptation, and sequence training, it achieves a WER of 8.81%, a 17.0% relative improvement over the LSTM-RNN. To our knowledge, this is the best published result on Aurora4.

机译：尽管在自动语音识别方面已经取得了很大的进步，但是在嘈杂的环境中仍然存在明显的性能下降。最近，非常深的卷积神经网络（CNN）已成功应用于计算机视觉和语音识别任务。基于我们之前对非常深的CNN所做的工作，本文进一步开发了该体系结构，以提高噪声鲁棒语音识别的识别精度。在提出的非常深的CNN架构中，我们研究了过滤器，池和输入特征图大小的最佳配置：减小了过滤器和池的大小，并扩展了输入特征的大小以允许添加更多的卷积层。然后研究适当的池化，填充和输入特征图选择策略，并将其应用于非常深的CNN，以使其对语音识别更加健壮。此外，对该架构的深入分析还揭示了关键特性，例如紧凑的模型规模，快速的收敛速度和噪声鲁棒性。拟议的新模型在两个任务上进行了评估：具有多种加性噪声类型和通道不匹配的Aurora4任务，以及具有明显混响的AMI满足转录任务。两项任务的实验均表明，所提出的非常深的CNN可以显着降低误码率（WER），以增强对噪声的语音识别能力。最好的架构相对于AMI上的传统CNN而言，相对降低了10.0％，与长期短期记忆循环神经网络（LSTM-RNN）声学模型相竞争。在Aurora4上，即使没有功能增强，模型适应和序列训练，它的WER仍为8.81％，相对LSTM-RNN而言相对提高了17.0％。据我们所知，这是在Aurora4上发布的最好的结果。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第12期|2263-2276|共14页
作者
Yanmin Qian; Mengxiao Bi; Tian Tan; Kai Yu;
展开▼
作者单位

Computer Science and Engineering Department, Shanghai Jiao Tong University, Shanghai, China;

Computer Science and Engineering Department, Shanghai Jiao Tong University, Shanghai, China;

Computer Science and Engineering Department, Shanghai Jiao Tong University, Shanghai, China;

Computer Science and Engineering Department, Shanghai Jiao Tong University, Shanghai, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech recognition; Convolution; Speech; Neural networks; Noise measurement; Noise robustness; Computer architecture;

机译：语音识别;卷积;语音;神经网络;噪声测量;噪声鲁棒性;计算机体系结构;

相似文献

外文文献
中文文献
专利

1. A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks [J] . Li B., Sim K.C. Audio, Speech, and Language Processing, IEEE Transactions on . 2014,第8期

机译：深度神经网络的语音鲁棒语音识别频谱掩蔽方法
2. Learning Deep Binaural Representations With Deep Convolutional Neural Networks for Spontaneous Speech Emotion Recognition [J] . Zhang Shiqing, Chen Aihua, Guo Wenping, Quality Control, Transactions . 2020,第期

机译：学习深层卷积神经网络的深层双耳陈述，用于自发言论情绪识别
3. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
4. Noise Robustness Automatic Speech Recognition with Convolutional Neural Network and Time Delay Neural Network [C] . Jie Wang, Dunze Wang, Yunda Chen, Audio Engineering Society Convention . 2020

机译：具有卷积神经网络和时间延迟神经网络的噪声鲁棒性自动语音识别
5. Noise robust speech recognition using dynamic synapse neural networks. [D] . Heidari Namarvar, Hassan. 2005

机译：使用动态突触神经网络的噪声鲁棒语音识别。
6. A Fast and Robust Deep Convolutional Neural Networks for Complex Human Activity Recognition Using Smartphone [O] . Wen Qi, Hang Su, Chenguang Yang, 2019

机译：使用智能手机进行复杂人类活动识别的快速而强大的深度卷积神经网络
7. Very Deep Convolutional Neural Networks for Robust Speech Recognition [O] . Qian, Yanmin, Woodland, Philip C 2016

机译：用于鲁棒语音识别的超深卷积神经网络

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅