...
首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Integrated exemplar-based template matching and statistical modeling for continuous speech recognition
【24h】

Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

机译:基于示例的集成模板匹配和统计建模,可进行连续语音识别

获取原文

摘要

We propose a novel approach of integrating exemplar-based template matching with statistical modeling to improve continuous speech recognition. We choose the template unit to be context-dependent phone segments (triphone context) and use multiple Gaussian mixture model (GMM) indices to represent each frame of speech templates. We investigate two different local distances, log likelihood ratio (LLR) and Kullback-Leibler (KL) divergence, for dynamic time warping (DTW)-based template matching. In order to reduce computation and storage complexities, we also propose two methods for template selection: minimum distance template selection (MDTS) and maximum likelihood template selection (MLTS). We further propose to fine tune the MLTS template representatives by using a GMM merging algorithm so that the GMMs can better represent the frames of the selected template representatives. Experimental results on the TIMIT phone recognition task and a large vocabulary continuous speech recognition (LVCSR) task of telehealth captioning demonstrated that the proposed approach of integrating template matching with statistical modeling significantly improved recognition accuracy over the hidden Markov modeling (HMM) baselines for both TIMIT and telehealth tasks. The template selection methods also provided significant accuracy gains over the HMM baseline while largely reducing the computation and storage complexities. When all templates or MDTS were used, using the LLR local distance gave better performance than the KL local distance. For MLTS and template compression, KL local distance gave better performance than the LLR local distance, and template compression further improved the recognition accuracy on top of MLTS while having less computational cost.
机译:我们提出了一种新的方法,将基于示例的模板匹配与统计建模相集成,以改善连续语音识别。我们选择模板单元为上下文相关的电话段(三音上下文),并使用多个高斯混合模型(GMM)索引来表示语音模板的每一帧。我们调查了两个不同的局部距离,对数似然比(LLR)和Kullback-Leibler(KL)散度,用于基于动态时间扭曲(DTW)的模板匹配。为了减少计算和存储的复杂性,我们还提出了两种模板选择方法:最小距离模板选择(MDTS)和最大似然模板选择(MLTS)。我们还建议通过使用GMM合并算法来微调MLTS模板代表,以便GMM可以更好地表示所选模板代表的帧。 TIMIT电话识别任务和远程医疗字幕的大词汇量连续语音识别(LVCSR)任务的实验结果表明,将模板匹配与统计模型集成在一起的方法大大提高了TIMIT的隐马尔可夫模型(HMM)基线的识别精度和远程医疗任务。模板选择方法还提供了超过HMM基线的显着精度,同时大大降低了计算和存储的复杂性。当使用所有模板或MDTS时,使用LLR局部距离要比KL局部距离提供更好的性能。对于MLTS和模板压缩,KL局部距离比LLR局部距离具有更好的性能,并且模板压缩进一步提高了MLTS之上的识别精度,同时降低了计算成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号