首页> 外文会议>European Signal Processing Conference >Integrating Denoising Autoencoder and Vector Taylor Series with Auditory Masking for Speech Recognition in Noisy Conditions
【24h】

Integrating Denoising Autoencoder and Vector Taylor Series with Auditory Masking for Speech Recognition in Noisy Conditions

机译:将降噪自动编码器和矢量泰勒级数与听觉掩蔽相集成,以在嘈杂的条件下进行语音识别

获取原文

摘要

We propose a new front-end feature compensation technique to improve the performance of Automatic Speech Recognition (ASR) systems in noisy environments. First, a Time Delay Neural Network (TDNN) based Denoising Autoencoder (DAE) is considered to compensate the noisy features. The DAE provides good gain in performance when it has been trained using the noise present in the test utterances (“seen” conditions). However, if the noise present in the test utterance is different to what was used in the training of the DAE (“un-seen” conditions), then the performance degrades to a great extent. To improve the ASR performance in such unseen conditions, a model compensation technique, namely the Vector Taylor Series with Auditory Masking (VTS-AM) is used. We propose a new Signal-to-Noise Ratio (SNR) based measure, which can reliably choose the type of compensation to be used for best performance gain. We show that the proposed technique improves the ASR performance significantly on noise corrupted TIMIT and Librispeech databases.
机译:我们提出了一种新的前端特征补偿技术,以改善嘈杂环境中自动语音识别(ASR)系统的性能。首先,基于时延神经网络(TDNN)的降噪自动编码器(DAE)被认为可以补偿噪声特征。当使用测试话语(“可见”条件)中存在的噪声训练DAE时,DAE可以提供良好的性能。但是,如果测试话语中出现的噪声与DAE训练中使用的噪声不同(“看不见”的条件),则性能将大大降低。为了在这种看不见的情况下提高ASR性能,使用了一种模型补偿技术,即带有听觉掩蔽的矢量泰勒级数(VTS-AM)。我们提出了一种新的基于信噪比(SNR)的度量,该度量可以可靠地选择要用于获得最佳性能增益的补偿类型。我们表明,所提出的技术大大改善了ASR性能,对噪声损坏的TIMIT和Librispeech数据库进行了改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号