...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition
【24h】

Very Deep Convolutional Neural Networks for Noise Robust Speech Recognition

机译:用于噪声鲁棒语音识别的超深度卷积神经网络

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Although great progress has been made in automatic speech recognition, significant performance degradation still exists in noisy environments. Recently, very deep convolutional neural networks (CNNs) have been successfully applied to computer vision and speech recognition tasks. Based on our previous work on very deep CNNs, in this paper this architecture is further developed to improve recognition accuracy for noise robust speech recognition. In the proposed very deep CNN architecture, we study the best configuration for the sizes of filters, pooling, and input feature maps: the sizes of filters and poolings are reduced and dimensions of input features are extended to allow for adding more convolutional layers. Then the appropriate pooling, padding, and input feature map selection strategies are investigated and applied to the very deep CNN to make it more robust for speech recognition. In addition, an in-depth analysis of the architecture reveals key characteristics, such as compact model scale, fast convergence speed, and noise robustness. The proposed new model is evaluated on two tasks: Aurora4 task with multiple additive noise types and channel mismatch, and the AMI meeting transcription task with significant reverberation. Experiments on both tasks show that the proposed very deep CNNs can significantly reduce word error rate (WER) for noise robust speech recognition. The best architecture obtains a 10.0% relative reduction over the traditional CNN on AMI, competitive with the long short-term memory recurrent neural networks (LSTM-RNN) acoustic model. On Aurora4, even without feature enhancement, model adaptation, and sequence training, it achieves a WER of 8.81%, a 17.0% relative improvement over the LSTM-RNN. To our knowledge, this is the best published result on Aurora4.
机译:尽管在自动语音识别方面已经取得了很大的进步,但是在嘈杂的环境中仍然存在明显的性能下降。最近,非常深的卷积神经网络(CNN)已成功应用于计算机视觉和语音识别任务。基于我们之前对非常深的CNN所做的工作,本文进一步开发了该体系结构,以提高噪声鲁棒语音识别的识别精度。在提出的非常深的CNN架构中,我们研究了过滤器,池和输入特征图大小的最佳配置:减小了过滤器和池的大小,并扩展了输入特征的大小以允许添加更多的卷积层。然后研究适当的池化,填充和输入特征图选择策略,并将其应用于非常深的CNN,以使其对语音识别更加健壮。此外,对该架构的深入分析还揭示了关键特性,例如紧凑的模型规模,快速的收敛速度和噪声鲁棒性。拟议的新模型在两个任务上进行了评估:具有多种加性噪声类型和通道不匹配的Aurora4任务,以及具有明显混响的AMI满足转录任务。两项任务的实验均表明,所提出的非常深的CNN可以显着降低误码率(WER),以增强对噪声的语音识别能力。最好的架构相对于AMI上的传统CNN而言,相对降低了10.0%,与长期短期记忆循环神经网络(LSTM-RNN)声学模型相竞争。在Aurora4上,即使没有功能增强,模型适应和序列训练,它的WER仍为8.81%,相对LSTM-RNN而言相对提高了17.0%。据我们所知,这是在Aurora4上发布的最好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号