首页> 外文期刊>IEEE transactions on audio, speech and language processing >Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing
【24h】

Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing

机译:带有去混响预处理的混响语音静态和动态方差补偿

获取原文
获取原文并翻译 | 示例

摘要

The performance of automatic speech recognition is severely degraded in the presence of noise or reverberation. Much research has been undertaken on noise robustness. In contrast, the problem of the recognition of reverberant speech has received far less attention and remains very challenging. In this paper, we use a dereverberation method to reduce reverberation prior to recognition. Such a preprocessor may remove most reverberation effects. However, it often introduces distortion, causing a dynamic mismatch between speech features and the acoustic model used for recognition. Model adaptation could be used to reduce this mismatch. However, conventional model adaptation techniques assume a static mismatch and may therefore not cope well with a dynamic mismatch arising from dereverberation. This paper proposes a novel adaptation scheme that is capable of managing both static and dynamic mismatches. We introduce a parametric model for variance adaptation that includes static and dynamic components in order to realize an appropriate interconnection between dereverberation and a speech recognizer. The model parameters are optimized using adaptive training implemented with the Expectation Maximization algorithm. An experiment using the proposed method with reverberant speech for a reverberation time of 0.5 s revealed that it was possible to achieve an 80% reduction in the relative error rate compared with the recognition of dereverberated speech (word error rate of 31%), and the final error rate was 5.4%, which was obtained by combining the proposed variance compensation and MLLR adaptation.
机译:在存在噪声或混响的情况下,自动语音识别的性能会严重降低。关于噪声鲁棒性已经进行了许多研究。相比之下,回响语音的识别问题受到的关注很少,并且仍然非常具有挑战性。在本文中,我们使用混响消除方法来减少识别之前的混响。这样的预处理器可以消除大多数混响效果。但是,它经常会引入失真,从而导致语音特征与用于识别的声学模型之间的动态不匹配。模型适应可用于减少这种不匹配。但是,常规的模型自适应技术假定静态失配,因此可能无法很好地解决混响去除所引起的动态失配。本文提出了一种新颖的自适应方案,该方案能够管理静态和动态失配。为了引入去混响和语音识别器之间的适当互连,我们引入了用于方差自适应的参数模型,该模型包括静态和动态组件。使用通过期望最大化算法实现的自适应训练来优化模型参数。使用所提出的带有混响语音的方法进行混响时间为0.5 s的实验表明,与去语音的识别相比,相对错误率可以降低80%(字错误率为31%),并且最终误差率为5.4%,这是通过结合建议的方差补偿和MLLR自适应获得的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号