Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

Sehr A.; Maas R.; Kellermann W.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

【24h】

Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

机译：Logmelspec域中基于混响模型的解码，用于鲁棒的远距离语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The REMOS (REverberation MOdeling for Speech recognition) concept for reverberation-robust distant-talking speech recognition, introduced in “Distant-talking continuous speech recognition based on a novel reverberation model in the feature domain” (A. Sehr , in Proc. Interspeech, 2006, pp. 769–772) for melspectral features, is extended to logarithmic melspectral (logmelspec) features in this contribution. Thus, the favorable properties of REMOS, including its high flexibility with respect to changing reverberation conditions, become available in the more competitive logmelspec domain. Based on a combined acoustic model consisting of a hidden Markov model (HMM) network and a reverberation model (RM), REMOS determines clean-speech and reverberation estimates during recognition. Therefore, in each iteration of a modified Viterbi algorithm, an inner optimization operation maximizes the joint density of the current HMM output and the RM output subject to the constraint that their combination is equal to the current reverberant observation. Since the combination operation in the logmelspec domain is nonlinear, numerical methods appear necessary for solving the constrained inner optimization problem. A novel reformulation of the constraint, which allows for an efficient solution by nonlinear optimization algorithms, is derived in this paper so that a practicable implementation of REMOS for logmelspec features becomes possible. An in-depth analysis of this REMOS implementation investigates the statistical properties of its reverberation estimates and thus derives possibilities for further improving the performance of REMOS. Connected digit recognition experiments show that the proposed REMOS version in the logmelspec domain significantly outperforms the melspec version. While the proposed RMs with parameters estimated by straightforward training for a given room are robust to a mismatch of the speaker–microphone distance, their perform-n-nance significantly decreases if they are used in a room with substantially different conditions. However, by training multi-style RMs with data from several rooms, good performance can be achieved across different rooms.

机译：REMOS（用于语音识别的混响模型）概念用于混响鲁棒的远距离语音识别，在“基于特征域中新型混响模型的远距离连续语音识别”（A. Sehr，Proc。Interspeech， 2006年，第pp。769–772页）针对这种谱特征，在这一贡献中扩展到了对数谱（logmelspec）特征。因此，REMOS的有利特性，包括其在改变混响条件方面的高度灵活性，在更具竞争性的logmelspec领域中变得可用。基于由隐马尔可夫模型（HMM）网络和混响模型（RM）组成的组合声学模型，REMOS在识别过程中确定清晰的语音和混响估计。因此，在改进的Viterbi算法的每次迭代中，内部优化操作都会在约束其组合等于当前混响观察的约束的情况下，使当前HMM输出和RM输出的联合密度最大化。由于logmelspec域中的组合操作是非线性的，因此数值方法似乎对于解决约束内部优化问题是必要的。本文提出了一种新的约束约束公式，该约束公式允许通过非线性优化算法进行有效求解，从而使针对logmelspec特征的REMOS的可行实现成为可能。对该REMOS实现的深入分析研究了其混响估计的统计特性，从而得出了进一步改善REMOS性能的可能性。关联数字识别实验表明，在logmelspec域中提出的REMOS版本明显优于melspec版本。尽管所建议的RM具有通过给定房间进行简单训练估算出的参数，可以抵抗扬声器与麦克风之间的距离不匹配，但如果在条件完全不同的房间中使用，它们的性能会大大降低。但是，通过使用来自多个房间的数据训练多样式RM，可以在不同房间之间实现良好的性能。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2010年第7期|p.1676-1691|共16页
作者
Sehr A.; Maas R.; Kellermann W.;
展开▼
作者单位

Chair of Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Erlangen, Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Acoustic modeling; distant-talking automatic speech recognition (ASR); model-based dereverberation; reverberation model; robust ASR;

机译：声学建模;远程自动语音识别（ASR）;基于模型的混响;混响模型;稳健的ASR;

相似文献

外文文献
中文文献
专利

1. Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition [J] . Yoshioka T., Sehr A., Delcroix M., Signal Processing Magazine, IEEE . 2012,第6期

机译：使机器在混响室中理解我们：针对自动语音识别的混响的鲁棒性
2. Combination of GMM-Based Speech Estimation Method and Temporal Domain SVD-Based Speech Enhancement for Noise Robust Speech Recognition [J] . Masakiyo Fujimoto, Yasuo Ariki Systems and Computers in Japan . 2007,第3期

机译：基于GMM的语音估计方法与基于时域SVD的语音增强相结合的噪声鲁棒语音识别
3. A Bayesian view on acoustic model-based techniques for robust speech recognition [J] . Roland Maas, Christian Huemmer, Armin Sehr, EURASIP journal on advances in signal processing . 2015,第1期

机译：贝叶斯观点基于声学模型的鲁棒语音识别技术
4. Model-based dereverberation in the logmelspec domain for robust distant-talking speech recognition [C] . Sehr, Armin, Maas, Roland, Kellermann, Walter IEEE International Conference on Acoustics Speech and Signal;ICASSP 2010 . 2010

机译：在logmelspec域中基于模型的混响消除鲁棒的远距离语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening [O] . Kate Helms Tillery, Christopher A. Brown, Sid P. Bacon -1

机译：比较混响和噪声对模拟电声听力中语音识别的影响
7. MODEL-BASED DEREVERBERATION IN THE LOGMELSPEC DOMAIN FOR ROBUST DISTANT-TALKING SPEECH RECOGNITION [O] . Armin Sehr, Walter Kellermann 2011

机译：LOGMELSPEC域中基于模型的去耦，用于鲁棒远程语音识别
8. Cepstral Domain Talker Stress Compensation for Robust Speech Recognition [R] . Chen, Y. 1988

机译：用于鲁棒语音识别的倒谱域语音应力补偿

Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅