首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Cepstrum-Domain Model Combination Based on Decomposition of Speech and Noise Using MMSE-LSA for ASR in Noisy Environments
【24h】

Cepstrum-Domain Model Combination Based on Decomposition of Speech and Noise Using MMSE-LSA for ASR in Noisy Environments

机译:MMSE-LSA在语音环境中基于语音和噪声分解的倒谱域模型组合

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an efficient method for combining models of speech and noise for robust speech recognition applications in noisy environments. This method decomposes the cepstrum domain representation of noise-corrupted speech into clean speech cepstrum and background noise cepstrum components using a minimum mean squared error-log spectral amplitude (MMSE-LSA) criterion. Speech recognition is then performed on noisy cepstrum domain observations using a model that is formed by parallel combination of cepstrum domain clean speech distributions and background noise distributions estimated using this MMSE-LSA based noise decomposition. This method is far more efficient than other parallel model combination (PMC) procedures because model combination is performed directly in the cepstrum domain rather than in the linear spectral domain. Whereas background noise model estimation is addressed as a separate issue in existing PMC procedures, this method explicitly incorporates a mechanism to continually update background noise models and signal-to-noise ratio (SNR) estimates over time. The performance of the proposed cepstrum-domain model combination method is compared with a well known implementation of PMC which uses a log-normal approximation when combining speech and background noise model means and variances on a connected digit string recognition task which is subjected to mismatched channel and environment conditions. As a result, it is shown that the proposed model combination technique gives a word error rate that is comparable to PMC when background noise information and SNR are known prior to estimation. The paper will also present the results of experiments where a combination of cepstrum-domain feature compensation and model combination are applied to this task.
机译:本文提出了一种有效的方法,用于将语音和噪声模型组合在一起,以在嘈杂的环境中实现鲁棒的语音识别应用。此方法使用最小均方误差对数谱幅度(MMSE-LSA)准则将噪声损坏的语音的倒谱域表示分解为干净的语音倒谱和背景噪声倒谱分量。然后,使用由倒谱域干净语音分布和使用基于MMSE-LSA的噪声分解估计的背景噪声分布的并行组合所形成的模型,对嘈杂的倒谱域观测进行语音识别。该方法比其他并行模型组合(PMC)程序效率更高,因为模型组合直接在倒谱域中执行,而不是在线性频谱域中执行。尽管背景噪声模型估计是现有PMC程序中的一个单独问题,但此方法明确结合了一种机制,可以随着时间不断更新背景噪声模型和信噪比(SNR)估计。将拟议的倒谱域模型组合方法的性能与公知的PMC实现方案进行比较,该方案在组合语音和背景噪声模型均值以及连接数字串识别任务上的方差时使用对数正态逼近法和环境条件。结果表明,在估计之前已知背景噪声信息和SNR时,所提出的模型组合技术可提供与PMC相当的字错误率。本文还将介绍将倒谱域特征补偿和模型组合相结合应用于此任务的实验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号