首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks
【24h】

A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks

机译:深度神经网络的语音鲁棒语音识别频谱掩蔽方法

获取原文
获取原文并翻译 | 示例

摘要

Improving the noise robustness of automatic speech recognition systems has been a challenging task for many years. Recently, it was found that Deep Neural Networks (DNNs) yield large performance gains over conventional GMM-HMM systems, when used in both hybrid and tandem systems. However, they are still far from the level of human expectations especially under adverse environments. Motivated by the separation-prior-to-recognition process of the human auditory system, we propose a robust spectral masking system where power spectral domain masks are predicted using a DNN trained on the same filter-bank features used for acoustic modeling. To further improve performance, Linear Input Network (LIN) adaptation is applied to both the mask estimator and the acoustic model DNNs. Since the estimation of LINs for the mask estimator requires stereo data, which is not available during testing, we proposed using the LINs estimated for the acoustic model DNNs to adapt the mask estimators. Furthermore, we used the same set of weights obtained from pre-training for the input layers of both the mask estimator and the acoustic model DNNs to ensure a better consistency for sharing LINs. Experimental results on benchmark Aurora2 and Aurora4 tasks demonstrated the effectiveness of our system, which yielded Word Error Rates (WERs) of 4.6% and 11.8% respectively. Furthermore, the simple averaging of posteriors from systems with and without spectral masking can further reduce the WERs to 4.3% on Aurora2 and 11.4% on Aurora4.
机译:多年来,提高自动语音识别系统的噪声鲁棒性一直是一项艰巨的任务。最近,发现在混合和串联系统中使用时,深度神经网络(DNN)都比传统的GMM-HMM系统具有更大的性能提升。但是,它们仍然离人类期望的水平还很远,特别是在不利的环境下。受人类听觉系统从分离到识别的过程的激励,我们提出了一种健壮的频谱掩蔽系统,其中使用在声学模型所用的相同滤波器组特征上训练的DNN来预测功率谱域掩膜。为了进一步提高性能,将线性输入网络(LIN)自适应应用于蒙版估计器和声学模型DNN。由于用于掩码估计器的LIN估计需要立体声数据,而立体声数据在测试期间不可用,因此我们建议使用针对声学模型DNN估计的LIN来适应掩码估计器。此外,我们对模板估计器和声学模型DNN的输入层使用了从预训练获得的相同权重,以确保共享LIN的更好一致性。在基准Aurora2和Aurora4任务上的实验结果证明了我们系统的有效性,该系统产生的单词错误率(WER)分别为4.6%和11.8%。此外,对具有和不具有频谱屏蔽的系统的后验者进行简单的平均可以进一步将WER降低到Aurora2上的4.3%和Aurora4上的11.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号