首页> 外文期刊>IEICE Transactions on Information and Systems >Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics
【24h】

Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation Based on HMM-Sufficient Statistics

机译:基于HMM足够统计量的快速无监督说话人自适应计算时间的减少

获取原文
获取原文并翻译 | 示例
       

摘要

In real-time speech recognition applications, there is a need to implement a fast and reliable adaptation algorithm. We propose a method to reduce adaptation time of the rapid unsupervised speaker adaptation based on HMM-Sufficient Statistics. We use only a single arbitrary utterance without transcriptions in selecting the N-best speakers' Sufficient Statistics created offline to provide data for adaptation to a target speaker. Further reduction of N-best implies a reduction in adaptation time. However, it degrades recognition performance due to insufficiency of data needed to robustly adapt the model. Linear interpolation of the global HMM-Sufficient Statistics offsets this negative effect and achieves a 50% reduction in adaptation time without compromising the recognition performance. Furthermore, we compared our method with Vocal Tract Length Normalization (VTLN), Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR). Moreover, we tested in office, car, crowd and booth noise environments in 10 dB, 15 dB, 20 dB and 25 dB SNRs.
机译:在实时语音识别应用中,需要实现一种快速而可靠的自适应算法。我们提出了一种基于HMM充足统计量的快速减少无监督说话人自适应时间的方法。在选择N个最佳演讲者离线创建的充足统计信息时,我们仅使用一个没有转录的任意话语来提供适应目标演讲者的数据。 N-best的进一步减少意味着适应时间的减少。但是,由于健壮地适应模型所需的数据不足,它降低了识别性能。全局HMM充足统计信息的线性插值抵消了这种负面影响,并在不影响识别性能的情况下将自适应时间减少了50%。此外,我们将我们的方法与声带长度归一化(VTLN),最大后验概率(MAP)和最大似然线性回归(MLLR)进行了比较。此外,我们在办公室,汽车,人群和展位的噪声环境中分别测试了10 dB,15 dB,20 dB和25 dB的SNR。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号