首页> 外文会议> >Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data
【24h】

Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but limited training data

机译:词汇量很大,但培训数据有限,可以完全识别连续的汉语普通话语音

获取原文

摘要

This paper presents the first known results for complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary but very limited training data. Although some isolated-syllable-based or isolated-word-based large-vocabulary Mandarin speech recognition systems have been successfully developed, a continuous-speech-based system of this kind has never been reported before. For successful development of this system, several important techniques have been used, including acoustic modeling of a set of sub-syllabic models for base syllable recognition and another set of context-dependent models for tone recognition, a multiple candidate searching technique based on a concatenated syllable matching algorithm to synchronize base syllable and tone recognition, and a word-class-based Chinese language model for linguistic decoding. The best recognition accuracy achieved is 88.69% for finally decoded Chinese characters, with 88.69%, 91.57%, and 81.37% accuracy for base syllables, tones, and tonal syllables respectively.
机译:本文提出了第一个已知的结果,它可以完全识别具有很大词汇量但非常有限的培训数据的汉语连续汉语普通话。尽管已经成功地开发了一些基于孤立音节或基于单词的大词汇量普通话语音识别系统,但是从未有过这种基于连续语音的系统的报道。为了成功开发该系统,已使用了几种重要的技术,包括用于基本音节识别的一组亚音节模型的声学模型和用于音调识别的另一组与上下文相关的模型,一种基于级联的多候选搜索技术。音节匹配算法,用于同步基本音节和音调识别;以及基于单词类的中文语言模型,用于语言解码。最终解码的汉字的最佳识别准确度为88.69%,基本音节,音调和音调音节的准确度分别为88.69%,91.57%和81.37%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号