首页> 外文期刊>Computer speech and language >Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures
【24h】

Speech enhancement for robust automatic speech recognition: Evaluation using a baseline system and instrumental measures

机译:语音增强功能可实现强大的自动语音识别:使用基准系统和仪器测量进行评估

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic speech recognition in everyday environments must be robust to significant levels of reverberation and noise. One strategy to achieve such robustness is multi-microphone speech enhancement. In this study, we present results of an evaluation of different speech enhancement pipelines using a state-of-the-art ASR system for a wide range of reverberation and noise conditions. The evaluation exploits the recently released ACE Challenge database which includes measured multichannel acoustic impulse responses from 7 different rooms with reverberation times ranging from 0.33 to 1.34 s. The reverberant speech is mixed with ambient, fan and babble noise recordings made with the same microphone setups in each of the rooms. In the first experiment, performance of the ASR without speech processing is evaluated. Results clearly indicate the deleterious effect of both noise and reverberation. In the second experiment, different speech enhancement pipelines are evaluated with relative word error rate reductions of up to 82%. Finally, the ability of selected instrumental metrics to predict ASR performance improvement is assessed. The best performing metric, Short-Time Objective Intelligibility Measure, is shown to have a Pearson correlation coefficient of 0.79, suggesting that it is a useful predictor of algorithm performance in these tests.
机译:日常环境中的自动语音识别必须对很大程度的混响和噪声具有鲁棒性。实现这种鲁棒性的一种策略是多麦克风语音增强。在这项研究中,我们介绍了使用最先进的ASR系统针对各种混响和噪声条件评估不同语音增强管道的结果。该评估利用了最近发布的ACE Challenge数据库,该数据库包括来自7个不同房间的混响时间范围为0.33到1.34 s的测量多通道声脉冲响应。混响的语音与在每个房间中使用相同的麦克风设置进行的环境,风扇和ba语噪声记录混合在一起。在第一个实验中,评估了没有语音处理的ASR性能。结果清楚地表明了噪声和混响的有害影响。在第二个实验中,评估了不同的语音增强流水线,相对词错误率降低了多达82%。最后,评估所选工具指标预测ASR性能改善的能力。表现最好的度量标准,即“短时客观清晰度”,其Pearson相关系数为0.79,表明它是这些测试中算法性能的有用预测指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号