首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition
【24h】

MMSE-Based Missing-Feature Reconstruction With Temporal Modeling for Robust Speech Recognition

机译:基于MMSE的缺失特征重建与时间建模,用于鲁棒语音识别

获取原文
获取原文并翻译 | 示例
       

摘要

This paper addresses the problem of feature compensation in the log-spectral domain by using the missing-data (MD) approach to noise robust speech recognition, that is, the log-spectral features can be either almost unaffected by noise or completely masked by it. First, a general MD framework based on minimum mean square error (MMSE) estimation is introduced which exploits the correlation across frequency bands to reconstruct the missing features. This framework allows the derivation of different MD imputation approaches and, in particular, a novel technique taking advantage of truncated Gaussian distributions is presented. While the proposed technique provides excellent results at high and medium signal-to-noise ratios (SNRs), its performance diminishes at low SNRs where very few reliable features are available. The reconstruction technique is therefore extended to exploit temporal constraints using two different approaches. In the first approach, time-frequency patches of speech containing a number of consecutive frames are modeled using a Gaussian mixture model (GMM). In the second one, the sequential structure of speech is alternatively modeled by a hidden Markov model (HMM). The proposed techniques are evaluated on Aurora-2 and Aurora-4 databases using both oracle and estimated masks. In both cases, the proposed techniques outperform the recognition performance obtained by the baseline system and other related techniques. Also, the introduction of a temporal modeling turns out to be very effective in reconstructing spectra at low SNRs. In particular, HMMs show the highest capability of accounting for time correlations and, therefore, achieve the best results.
机译:本文通过使用缺失数据(MD)方法对噪声稳健的语音识别来解决对数谱域中的特征补偿问题,即对数谱特征几乎不受噪声影响或被噪声完全掩盖。首先,引入了基于最小均方误差(MMSE)估计的通用MD框架,该框架利用跨频带的相关性来重构缺失的特征。该框架允许推导不同的MD插补方法,尤其是提出了一种利用截断的高斯分布的新颖技术。虽然所提出的技术在高和中等信噪比(SNR)时提供了出色的结果,但在低SNR时(几乎没有可靠的功能),其性能会下降。因此,重构技术被扩展为使用两种不同的方法来利用时间约束。在第一种方法中,使用高斯混合模型(GMM)对包含多个连续帧的语音时频补丁进行建模。在第二篇文章中,语音的顺序结构可以通过隐马尔可夫模型(HMM)进行建模。使用oracle和估计掩码在Aurora-2和Aurora-4数据库上评估了建议的技术。在这两种情况下,所提出的技术均优于基线系统和其他相关技术所获得的识别性能。而且,在重建低SNR的频谱时,时间建模的引入非常有效。特别是,HMM显示出最高的时间相关性,因此可获得最佳结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号