Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems

Masashi Unoki; Masato Akagi; Xugang Lu

首页> 外文期刊>Acoustical science and technology >Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems

【24h】

Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

References(29) Cited-By(4) To reduce speech degradation in reverberant environments, we previously proposed a modulation-transfer-function (MTF)-based method of speech dereverberation. By considering the temporal modulation properties of speech, and the exponential decay properties of the power envelope of the impulse response of room acoustics, we obtained the following MTF relation: the sub-band power envelope of reverberant speech that can be represented as a convolution between the sub-band power envelope of clean speech and the power envelope of the impulse response of room acoustics. On the basis of the MTF relation, inverse MTF filtering can be applied to restoring the power envelopes of reverberant speech. Therefore, the impulse response of the room acoustics in this restoration dose not need to be measured at any time since we model the power envelope of the impulse response as an exponential decay function. We have tested how effective this method is as a front-end for automatic speech recognition (ASR) systems in artificial and real reverberant environments. Reverberant speech signals were created by simply convoluting clean speech (AURORA-2J database) with the artificially produced or real impulse responses of room acoustics. A method based on the auditory power spectrum was used as a baseline for comparison. Compared with the baseline, the proposed method for artificial reverberant environments produced a 35.67% relative improvement in the error reduction rate (on average, for reverberation times from 0.2 to 2.0 s), and for real reverberant environments (43 reverberant impulse responses), it produced a 25.78% relative improvement in the error reduction rate. The results demonstrate that our new approach can improve the robustness of speech-recognition systems in reverberant environments, and it performs better than conventional methods.

机译：参考文献（29）Cited-By（4）为了减少混响环境中的语音质量下降，我们先前提出了一种基于调制传递函数（MTF）的语音混响方法。通过考虑语音的时间调制特性以及房间声学脉冲响应的功率包络的指数衰减特性，我们获得了以下MTF关系：混响语音的子带功率包络可以表示为干净语音的子带功率包络和房间声学脉冲响应的功率包络。基于MTF关系，可以将MTF逆滤波应用于恢复混响语音的功率包络。因此，由于我们将脉冲响应的功率包络建模为指数衰减函数，因此在任何时候都无需测量此恢复剂量下的室内声音的脉冲响应。我们已经测试了这种方法在人工和真实混响环境中作为自动语音识别（ASR）系统前端的有效性。通过简单地将干净的语音（AURORA-2J数据库）与室内声音的人工产生或真实的脉冲响应进行卷积就可以创建回响语音信号。使用基于听觉功率谱的方法作为比较的基准。与基线相比，所提出的用于人工混响环境的方法的错误减少率（平均，混响时间从0.2到2.0 s）和相对于真实混响环境（43种混响脉冲响应）的相对降低率为35.67％。产生了25.78％的相对误差减少率的相对改善。结果表明，我们的新方法可以提高混响环境中语音识别系统的鲁棒性，并且比常规方法具有更好的性能。

著录项

来源
《Acoustical science and technology》 |2008年第6期|共11页
作者
Masashi Unoki; Masato Akagi; Xugang Lu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类声学;
关键词

相似文献

外文文献
中文文献
专利

1. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [J] . Xugang Lu, Masashi Unoki, Masato Akagi Acoustical science and technology . 2008,第6期

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器
2. Sub-band temporal modulation envelopes and their normalization for automatic speech recognition in reverberant environments [J] . Xugang Lu, Masashi Unoki, Satoshi Nakamura Computer speech and language . 2011,第3期

机译：混响环境中用于自动语音识别的子带时间调制包络及其标准化
3. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference [J] . Byeongwook Lee, Kwang-Hyun Cho Scientific reports. . 2016,第1期

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割
4. MTF-based Sub-band Power-envelope Restoration for Robust Speech Recognitionin Noisy Reverberant Environments [C] . Shota Morita, Xugang Lu, Masashi Unoki, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2011

机译：基于MTF的子带功率包络恢复在嘈杂混响环境中的鲁棒语音识别
5. Noise robust front-end processing for automatic speech recognition. [D] . Zhu, Qifeng. 2001

机译：用于自动语音识别的强大的抗噪前端处理。
6. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference [O] . Byeongwook Lee, Kwang-Hyun Cho -1

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割
7. Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems [O] . Lu, Xugang, Unoki, Masashi, Akagi, Masato 2008

机译：比较评估基于调制传递函数的语音子带功率包络的盲恢复作为自动语音识别系统的前端处理器

Comparative evaluation of modulation-transfer-function-based blind restoration of sub-band power envelopes of speech as a front-end processor for automatic speech recognition systems

摘要

著录项

相似文献

相关主题

期刊订阅