Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

Xie Sun; Yunxin Zhao

首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

【24h】

Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

机译：基于示例的集成模板匹配和统计建模，可进行连续语音识别

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a novel approach of integrating exemplar-based template matching with statistical modeling to improve continuous speech recognition. We choose the template unit to be context-dependent phone segments (triphone context) and use multiple Gaussian mixture model (GMM) indices to represent each frame of speech templates. We investigate two different local distances, log likelihood ratio (LLR) and Kullback-Leibler (KL) divergence, for dynamic time warping (DTW)-based template matching. In order to reduce computation and storage complexities, we also propose two methods for template selection: minimum distance template selection (MDTS) and maximum likelihood template selection (MLTS). We further propose to fine tune the MLTS template representatives by using a GMM merging algorithm so that the GMMs can better represent the frames of the selected template representatives. Experimental results on the TIMIT phone recognition task and a large vocabulary continuous speech recognition (LVCSR) task of telehealth captioning demonstrated that the proposed approach of integrating template matching with statistical modeling significantly improved recognition accuracy over the hidden Markov modeling (HMM) baselines for both TIMIT and telehealth tasks. The template selection methods also provided significant accuracy gains over the HMM baseline while largely reducing the computation and storage complexities. When all templates or MDTS were used, using the LLR local distance gave better performance than the KL local distance. For MLTS and template compression, KL local distance gave better performance than the LLR local distance, and template compression further improved the recognition accuracy on top of MLTS while having less computational cost.

机译：我们提出了一种新的方法，将基于示例的模板匹配与统计建模相集成，以改善连续语音识别。我们选择模板单元为上下文相关的电话段（三音上下文），并使用多个高斯混合模型（GMM）索引来表示语音模板的每一帧。我们调查了两个不同的局部距离，对数似然比（LLR）和Kullback-Leibler（KL）散度，用于基于动态时间扭曲（DTW）的模板匹配。为了减少计算和存储的复杂性，我们还提出了两种模板选择方法：最小距离模板选择（MDTS）和最大似然模板选择（MLTS）。我们还建议通过使用GMM合并算法来微调MLTS模板代表，以便GMM可以更好地表示所选模板代表的帧。 TIMIT电话识别任务和远程医疗字幕的大词汇量连续语音识别（LVCSR）任务的实验结果表明，将模板匹配与统计模型集成在一起的方法大大提高了TIMIT的隐马尔可夫模型（HMM）基线的识别精度和远程医疗任务。模板选择方法还提供了超过HMM基线的显着精度，同时大大降低了计算和存储的复杂性。当使用所有模板或MDTS时，使用LLR局部距离要比KL局部距离提供更好的性能。对于MLTS和模板压缩，KL局部距离比LLR局部距离具有更好的性能，并且模板压缩进一步提高了MLTS之上的识别精度，同时降低了计算成本。

著录项

来源
《EURASIP journal on audio, speech, and music processing》 |2014年第1期|共16页
作者
Xie Sun; Yunxin Zhao;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Building Statistical Language Models for Persian Continuous Speech Recognition Systems Using the Peykare Corpus [J] . Mohammad Bahrani, Hossein Sameti International journal of computer processing of languages . 2011,第1期

机译：使用Peykare语料库为波斯语连续语音识别系统建立统计语言模型
2. Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds [J] . Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Computer speech and language . 2013,第3期

机译：客厅中的语音识别：基于声音的空间，频谱和时间建模的集成语音增强和识别系统
3. Template Matching using Statistical Model and Parametric Template for Multi-Template [J] . Chin-Sheng Chen, Jian-Jhe Huang, Chien-Liang Huang Journal of Signal and Information Processing . 2013,第3期

机译：使用统计模型和参数模板对多模板进行模板匹配
4. On the Effectiveness of Statistical Modeling based Template Matching Approach for Continuous Speech Recognition [C] . Xie Sun, Xin Chen, Yunxin Zhao Annual conference of the International Speech Communication Association;INTERSPEECH 2011 . 2011

机译：基于统计建模的模板匹配方法在连续语音识别中的有效性
5. Integrate template matching and statistical modeling for continuous speech recognition. [D] . Sun, Xie. 2011

机译：集成模板匹配和统计建模，可进行连续语音识别。
6. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates [O] . Yuedong Yang, Eshel Faraggi, Huiying Zhao, -1

机译：通过在查询的预测的一维结构特性与模板的相应本机特性之间采用基于概率的匹配改善蛋白质折叠识别和基于模板的建模
7. Integrated exemplar-based template matching and statistical modeling for continuous speech recognition [O] . Xie Sun, Yunxin Zhao 2014

机译：集成的基于示例的模板匹配和统计模型，用于连续语音识别
8. Statistical Modeling for Continuous Speech Recognition [R] . Schwartz, R., Chow, Y. L., Derr, A., 1988

机译：连续语音识别的统计建模

Integrated exemplar-based template matching and statistical modeling for continuous speech recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅