首页> 外文期刊>Acoustical science and technology >Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems
【24h】

Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems

机译:比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器

获取原文
       

摘要

References(29) Cited-By(4) To reduce speech degradation in reverberant environments, we previously proposed a modulation-transfer-function (MTF)-based method of speech dereverberation. By considering the temporal modulation properties of speech, and the exponential decay properties of the power envelope of the impulse response of room acoustics, we obtained the following MTF relation: the sub-band power envelope of reverberant speech that can be represented as a convolution between the sub-band power envelope of clean speech and the power envelope of the impulse response of room acoustics. On the basis of the MTF relation, inverse MTF filtering can be applied to restoring the power envelopes of reverberant speech. Therefore, the impulse response of the room acoustics in this restoration dose not need to be measured at any time since we model the power envelope of the impulse response as an exponential decay function. We have tested how effective this method is as a front-end for automatic speech recognition (ASR) systems in artificial and real reverberant environments. Reverberant speech signals were created by simply convoluting clean speech (AURORA-2J database) with the artificially produced or real impulse responses of room acoustics. A method based on the auditory power spectrum was used as a baseline for comparison. Compared with the baseline, the proposed method for artificial reverberant environments produced a 35.67% relative improvement in the error reduction rate (on average, for reverberation times from 0.2 to 2.0 s), and for real reverberant environments (43 reverberant impulse responses), it produced a 25.78% relative improvement in the error reduction rate. The results demonstrate that our new approach can improve the robustness of speech-recognition systems in reverberant environments, and it performs better than conventional methods.
机译:参考文献(29)Cited-By(4)为了减少混响环境中的语音质量下降,我们先前提出了一种基于调制传递函数(MTF)的语音混响方法。通过考虑语音的时间调制特性以及房间声学脉冲响应的功率包络的指数衰减特性,我们获得了以下MTF关系:混响语音的子带功率包络可以表示为干净语音的子带功率包络和房间声学脉冲响应的功率包络。基于MTF关系,可以将MTF逆滤波应用于恢复混响语音的功率包络。因此,由于我们将脉冲响应的功率包络建模为指数衰减函数,因此在任何时候都无需测量此恢复剂量下的室内声音的脉冲响应。我们已经测试了这种方法在人工和真实混响环境中作为自动语音识别(ASR)系统前端的有效性。通过简单地将干净的语音(AURORA-2J数据库)与室内声音的人工产生或真实的脉冲响应进行卷积就可以创建回响语音信号。使用基于听觉功率谱的方法作为比较的基准。与基线相比,所提出的用于人工混响环境的方法的错误减少率(平均,混响时间从0.2到2.0 s)和相对于真实混响环境(43种混响脉冲响应)的相对降低率为35.67%。产生了25.78%的相对误差减少率的相对改善。结果表明,我们的新方法可以提高混响环境中语音识别系统的鲁棒性,并且比常规方法具有更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号