首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition
【24h】

Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

机译:Logmelspec域中基于混响模型的解码,用于鲁棒的远距离语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in “Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain” (A. Sehr , in Proc. Interspeech, 2006, pp. 769–772) for melspectral features, is extended to logarithmic melspectral (logmelspec) features in this contribution. Thus, the favorable properties of REMOS, including its high flexibility with respect to changing reverberation conditions, become available in the more competitive logmelspec domain. Based on a combined acoustic model consisting of a hidden Markov model (HMM) network and a reverberation model (RM), REMOS determines clean-speech and reverberation estimates during recognition. Therefore, in each iteration of a modified Viterbi algorithm, an inner optimization operation maximizes the joint density of the current HMM output and the RM output subject to the constraint that their combination is equal to the current reverberant observation. Since the combination operation in the logmelspec domain is nonlinear, numerical methods appear necessary for solving the constrained inner optimization problem. A novel reformulation of the constraint, which allows for an efficient solution by nonlinear optimization algorithms, is derived in this paper so that a practicable implementation of REMOS for logmelspec features becomes possible. An in-depth analysis of this REMOS implementation investigates the statistical properties of its reverberation estimates and thus derives possibilities for further improving the performance of REMOS. Connected digit recognition experiments show that the proposed REMOS version in the logmelspec domain significantly outperforms the melspec version. While the proposed RMs with parameters estimated by straightforward training for a given room are robust to a mismatch of the speaker–microphone distance, their perform-n-nance significantly decreases if they are used in a room with substantially different conditions. However, by training multi-style RMs with data from several rooms, good performance can be achieved across different rooms.
机译:REMOS(用于语音识别的混响模型)概念用于混响鲁棒的远距离语音识别,在“基于特征域中新型混响模型的远距离连续语音识别”(A. Sehr,Proc。Interspeech, 2006年,第pp。769–772页)针对这种谱特征,在这一贡献中扩展到了对数谱(logmelspec)特征。因此,REMOS的有利特性,包括其在改变混响条件方面的高度灵活性,在更具竞争性的logmelspec领域中变得可用。基于由隐马尔可夫模型(HMM)网络和混响模型(RM)组成的组合声学模型,REMOS在识别过程中确定清晰的语音和混响估计。因此,在改进的Viterbi算法的每次迭代中,内部优化操作都会在约束其组合等于当前混响观察的约束的情况下,使当前HMM输出和RM输出的联合密度最大化。由于logmelspec域中的组合操作是非线性的,因此数值方法似乎对于解决约束内部优化问题是必要的。本文提出了一种新的约束约束公式,该约束公式允许通过非线性优化算法进行有效求解,从而使针对logmelspec特征的REMOS的可行实现成为可能。对该REMOS实现的深入分析研究了其混响估计的统计特性,从而得出了进一步改善REMOS性能的可能性。关联数字识别实验表明,在logmelspec域中提出的REMOS版本明显优于melspec版本。尽管所建议的RM具有通过给定房间进行简单训练估算出的参数,可以抵抗扬声器与麦克风之间的距离不匹配,但如果在条件完全不同的房间中使用,它们的性能会大大降低。但是,通过使用来自多个房间的数据训练多样式RM,可以在不同房间之间实现良好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号