首页> 外文期刊>Computers & mathematics with applications >An HMM-based method for Thai spelling speech recognition
【24h】

An HMM-based method for Thai spelling speech recognition

机译:基于HMM的泰语拼写语音识别方法

获取原文
获取原文并翻译 | 示例

摘要

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents an approach to construct three recognizers for the three commonly-used Thai spelling methods based on hidden Markov models (HMMs). The Thai phonetic characteristics, alphabet system and spelling methods are analyzed. For the first spelling method, two recognizers, each trained from a small spelling corpus and an existing large continuous speech corpus, are explored. To solve utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed is taken into account. Two alternative language models, bigram and trigram, are investigated to evaluate the performance of spelling speech recognition under three different environments: close-type, open-type and mix-type language models. For the first spelling method, our approach achieves up to 93.09% letter correct rate (LCR) and 92.45% letter accuracy (LA) when the language model is trigram under the mix-type environment and the acoustic model is trained from the small spelling corpus. Under the same conditions, we obtained 81.12% LCR and 76.32% LA for the second spelling method and 78.47% LCR and 71.75% LA for the third spelling method. By analyzing the results, it was found that the main source of the errors was letter substitution, which is mostly triggered by the confusion of similar consonant phones and the confusion of short/long vowel pairs.
机译:拼写语音识别可以用于多种目的,包括增强语音识别系统和实现名称检索系统。本文提出了一种基于隐马尔可夫模型(HMM)为三种常用的泰语拼写方法构造三个识别器的方法。分析了泰语的语音特性,字母系统和拼写方法。对于第一种拼写方法,研究了两个识别器,每个识别器都由一个小的拼写语料库和一个现有的大的连续语音语料库训练而成。为了解决拼写发声和连续语音发声之间的发声速度差异,考虑了发声速度的调整。研究了两种替代语言模型bigram和trigram,以评估三种不同环境下的拼写语音识别性能:封闭型,开放型和混合型语言模型。对于第一种拼写方法,当在混合类型环境下语言模型为Trigram且从小拼写语料库训练声学模型时,我们的方法可达到93.09%的字母正确率(LCR)和92.45%的字母准确度(LA) 。在相同条件下,第二种拼写方法的LCR为81.12%,LA为76.32%,第三种拼写方法的LCR为78.47%,LA为71.75%。通过分析结果,发现错误的主要来源是字母替换,这主要是由相似辅音电话的混乱和短/长元音对的混乱引起的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号