Robustness to speaker position in distant-talking automatic speech recognition

机译：远距离自动语音识别中说话人位置的稳健性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we show a method that significantly improved our previous work in single-channel dereverberation. The proposed method is more robust to changes in speaker position in distanttalking ASR. First, we update the room transfer function (RTF) and weighting parameters for dereverberation to the target speaker position. This scheme corrects speech power variation as a function of position in the waveform level. Consequently, its impact to the acoustic model is verified. Then, we implement a fast acoustic model update reflective of the speech power level of the target speaker position. Furthermore, the scheme in updating the model is simple and precludes time-consuming model re-estimation. As a result, the proposed method can be executed online. The synergy of these corrective measures significantly minimizes the mismatch between training and testing conditions. We test our method using real reverberant data with different locations inside the room. Experimental results show that the proposed method outperforms the conventional methods in terms of ASR performance. Moreover, our fast acoustic model update scheme is at par in terms of recognition performance against time-consuming model re-estimation.

机译：在本文中，我们展示了一种可以显着改善我们先前在单通道去混响方面的工作的方法。所提出的方法对于在远程ASR中说话者位置的改变更加鲁棒。首先，我们更新房间传递函数（RTF）和权重参数以将混响去除到目标扬声器位置。该方案根据波形水平中的位置来校正语音功率变化。因此，验证了其对声学模型的影响。然后，我们实现了反映目标扬声器位置的语音功率水平的快速声学模型更新。此外，用于更新模型的方案很简单，并且避免了耗时的模型重新估计。结果，所提出的方法可以在线执行。这些纠正措施的协同作用可最大程度地减少训练和测试条件之间的不匹配。我们使用房间内不同位置的真实混响数据测试我们的方法。实验结果表明，该方法在ASR性能方面优于传统方法。而且，我们的快速声学模型更新方案在识别性能和耗时的模型重新估计方面均处于同等水平。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2013年|7034-7038|共5页
会议地点
作者
Gomez Randy; Nakamura Keisuke; Nakadai Kazuhiro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Automatic Speech Recognition; Dereverberation; Robustness; Speech Enhancement;

机译：自动语音识别;混响;稳健性;语音增强;

相似文献

外文文献
中文文献
专利

1. Reverberation Model-Based Decoding in the Logmelspec Domain for Robust Distant-Talking Speech Recognition [J] . Sehr A., Maas R., Kellermann W. Audio, Speech, and Language Processing, IEEE Transactions on . 2010,第7期

机译：Logmelspec域中基于混响模型的解码，用于鲁棒的远距离语音识别
2. Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM [J] . Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa Speech Communication . 2007,第6期

机译：通过结合特定于说话人的GMM和适用于说话人的HMM，基于位置相关的CMN进行鲁棒的远方说话人识别
3. TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition [J] . Li Wenjie, Zhang Pengyuan, Yan Yonghong Electronics Letters . 2019,第14期

机译：TEnet：目标说话人提取网络，具有累积的说话人嵌入功能，可自动识别语音
4. ROBUSTNESS TO SPEAKER POSITION IN DISTANT-TALKING AUTOMATIC SPEECH RECOGNITION [C] . Randy Gomez, Keisuke Nakamura, Kazuhiro Nakadai IEEE International Conference on Acoustics, Speech and Signal Processing . 2013

机译：遥远谈话的自动语音识别中的发言者位置的鲁棒性
5. Environmental and speaker robustness in automatic speech recognition with limited learning data. [D] . Cui, Xiaodong. 2005

机译：具有有限学习数据的自动语音识别中的环境和说话者鲁棒性。
6. Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：识别消息和使者：仿生频谱分析可增强语音和说话者识别能力
7. Automatic Speech recognition, with large vocabulary, robustness, independence of speaker and multilingual processing [O] . CAON D. R. S. 2010

机译：自动语音识别，词汇量大，健壮性强，说话者独立且具有多语言处理能力
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Robustness to speaker position in distant-talking automatic speech recognition

摘要

著录项

相似文献

相关主题

期刊订阅