首页> 外文期刊>EURASIP Journal on Audio, Speech, and Music Processing >Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution
【24h】

Compensation of SNR and noise type mismatch using an environmental sniffing based speech recognition solution

机译:使用基于环境嗅探的语音识别解决方案补偿SNR和噪声类型不匹配

获取原文
获取原文并翻译 | 示例
           

摘要

Multiple-model based speech recognition (MMSR) has been shown to be quite successful in noisy speech recognition. Since it employs multiple hidden Markov model (HMM) sets that correspond to various noise types and signal-to-noise ratio (SNR) values, the selected acoustic model can be closely matched with the test noisy speech, which leads to improved performance when compared with other state-of-the-art speech recognition systems that employ a single HMM set. However, as the number of HMM sets is usually limited due to practical considerations as well as effective model selection, acoustic mismatch can still be a problem in MMSR. In this study, we proposed methods to improve recognition performance by mitigating the mismatch in SNR and noise type for an MMSR solution. For the SNR mismatch, an optimal SNR mapping between the test noisy speech and the HMM was determined by experimental investigation. Improved performance was demonstrated by employing the SNR mapping instead of using the estimated SNR of the test noisy speech directly. We also proposed a novel method to reduce the effect of noise type mismatch by compensating the test noisy speech in the log-spectrum domain. We first derive the relation between the log-spectrum vectors in the test and training noisy speech. Since the relation is a non-linear function of the speech and noise parameters, the statistical information regarding the testing log-spectrum vectors was obtained by approximation using vector Taylor series (VTS) algorithm. Finally, the minimum mean square error estimation of the training log-spectrum vectors was used to reduce the mismatch between the training and test noisy speech. By employing the proposed methods in the MMSR framework, relative word error rate reduction of 18.7% and 21.3% was achieved on the Aurora 2 task when compared to a conventional MMSR and multi-condition training (MTR) method, respectively.
机译:基于多模型的语音识别(MMSR)已被证明在嘈杂的语音识别中非常成功。由于它采用了与各种噪声类型和信噪比(SNR)值相对应的多个隐马尔可夫模型(HMM)集,因此所选声学模型可以与测试噪声语音紧密匹配,从而在进行比较时可以提高性能以及其他采用单个HMM集的最新语音识别系统。但是,由于出于实际考虑以及有效的模型选择,通常会限制HMM集的数量,因此声学失配仍然是MMSR中的问题。在这项研究中,我们提出了通过减少MMSR解决方案的SNR和噪声类型的不匹配来提高识别性能的方法。对于SNR失配,通过实验研究确定了测试嘈杂语音与HMM之间的最佳SNR映射。通过采用SNR映射而不是直接使用测试带噪语音的估计SNR可以证明性能得到了改善。我们还提出了一种新方法,通过在对数谱域中补偿测试噪声语音来减少噪声类型不匹配的影响。我们首先导出测试中对数谱向量与训练有声语音之间的关系。由于该关系是语音和噪声参数的非线性函数,因此使用矢量泰勒级数(VTS)算法通过近似获得有关测试对数谱向量的统计信息。最后,使用训练对数谱向量的最小均方误差估计来减少训练和测试噪声语音之间的不匹配。通过在MMSR框架中采用建议的方法,与常规的MMSR和多条件训练(MTR)方法相比,Aurora 2任务的相对单词错误率分别降低了18.7%和21.3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号