首页> 外文期刊>IEEE transactions on audio, speech and language processing >Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation
【24h】

Enhanced Speech Features by Single-Channel Joint Compensation of Noise and Reverberation

机译:通过单通道联合噪声和混响补偿增强语音功能

获取原文
获取原文并翻译 | 示例

摘要

For a natural verbal communication between humans and machines, automatic speech recognition, which works reasonably well on recordings captured with mid- or far-field microphones, is essential. While a lot of research and development are devoted to address one of the two distortions frequently encountered in mid- and far-field sound pickup, namely noise or reverberation, less effort has been undertaken to jointly combat both kinds of distortions. In our view, however, this is essential to further reduce the demolishing effect by moving the microphone away from the speaker's mouth because in real environments both kinds of distortions are present. In this paper, we propose a first step into this direction by integrating an estimate of the reverberation energy derived by an auxiliary model based on multistep linear prediction, into a framework, which, so far tracks and removes nonstationary additive distortion by particle filters in a low-dimension logarithmic power frequency domain. On actual recordings with different speaker-to-microphone distances, we observe that combating, in the feature space, either nonstationary noise or reverberation alone, on a single channel, is already able to improve speech recognition performance before and after acoustic model adaptation. Furthermore, we observe that a simple concatenation of techniques addressing either additive noise or reverberation can further improve the accuracy in some cases. Last but not least, we demonstrate that the joint estimation and removal of both kinds of distortions, as proposed in this publication, further improve the accuracy of the text output.
机译:对于人机之间自然的语言交流,自动语音识别非常重要,这种自动语音识别在用中场或远场麦克风捕获的录音上相当有效。尽管大量研究和开发致力于解决在中场和远场拾音中经常遇到的两种失真之一,即噪声或混响,但为共同应对这两种失真所做的工作却较少。但是,在我们看来,这是通过将麦克风移离扬声器的嘴巴来进一步降低拆卸效果所必不可少的,因为在实际环境中会同时出现两种失真。在本文中,我们通过将基于多步线性预测的辅助模型得出的混响能量的估计值整合到一个框架中,向该方向迈出了第一步,该框架到目前为止已跟踪并消除了粒子滤波器中的非平稳加性失真。低维对数功率频域。在具有不同扬声器到麦克风距离的实际录音中,我们观察到在特征空间中,在单个通道上对抗非平稳噪声或混响,已经能够改善声学模型自适应前后的语音识别性能。此外,我们观察到,在某些情况下,解决附加噪声或混响的技术的简单串联可以进一步提高准确性。最后但并非最不重要的一点是,我们证明了本出版物中提出的联合估计和两种失真的消除,可以进一步提高文本输出的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号