首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP >Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
【24h】

Normalized amplitude modulation features for large vocabulary noise-robust speech recognition

机译:归一化幅度调制功能,用于大词汇量鲁棒语音识别

获取原文

摘要

Background noise and channel degradations seriously constrain the performance of state-of-the-art speech recognition systems. Studies comparing human speech recognition performance with automatic speech recognition systems indicate that the human auditory system is highly robust against background noise and channel variabilities compared to automated systems. A traditional way to add robustness to a speech recognition system is to construct a robust feature set for the speech recognition model. In this work, we present an amplitude modulation feature derived from Teager's nonlinear energy operator that is power normalized and cosine transformed to produce normalized modulation cepstral coefficient (NMCC) features. The proposed NMCC features are compared with respect to state-of-the-art noise-robust features in Aurora-2 and a renoised Wall Street Journal (WSJ) corpus. The WSJ word-recognition experiments were performed on both a clean and artificially renoised WSJ corpus using SRI's DECIPHER large vocabulary speech recognition system. The experiments were performed under three train-test conditions: (a) matched, (b) mismatched, and (c) multi-conditioned. The Aurora-2 digit recognition task was performed using the standard HTK recognizer distributed with Aurora-2. Our results indicate that the proposed NMCC features demonstrated noise robustness in almost all the training-test conditions of renoised WSJ data and also improved digit recognition accuracies for Aurora-2 compared to the MFCCs and state-of-the-art noise-robust features
机译:背景噪声和信道降级严重限制了最新语音识别系统的性能。将人类语音识别性能与自动语音识别系统进行比较的研究表明,与自动系统相比,人类听觉系统在抵御背景噪声和信道变化方面具有很高的鲁棒性。向语音识别系统添加鲁棒性的传统方法是为语音识别模型构建鲁棒的功能集。在这项工作中,我们提出了从Teager的非线性能量算子得到的幅度调制特征,该特征进行了功率归一化和余弦变换以产生归一化调制倒谱系数(NMCC)特征。将拟议的NMCC功能与Aurora-2和经过重新噪点的《华尔街日报》(WSJ)语料库中的最新噪声健壮功能进行了比较。使用SRI的DECIPHER大词汇量语音识别系统,在干净的和人工重新噪点的WSJ语料库上进行了WSJ单词识别实验。实验在三种训练条件下进行:(a)匹配,(b)不匹配,(c)多条件。 Aurora-2数字识别任务是使用与Aurora-2一起分发的标准HTK识别器执行的。我们的结果表明,与MFCC相比,拟议的NMCC功能在经过重新噪点的WSJ数据的几乎所有训练测试条件下均表现出噪声鲁棒性,并且还改善了Aurora-2的数字识别精度,并提供了最新的鲁棒性功能

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号