首页> 外国专利> STATISTICAL LANGUAGE MODEL GENERATING DEVICE, SPEECH RECOGNITION DEVICE, INFORMATION RETRIEVAL PROCESSOR AND KANA/KANJI CONVERTER

STATISTICAL LANGUAGE MODEL GENERATING DEVICE, SPEECH RECOGNITION DEVICE, INFORMATION RETRIEVAL PROCESSOR AND KANA/KANJI CONVERTER

机译:统计语言模型生成设备,语音识别设备,信息检索处理器和KANA / KANJI转换器

摘要

PROBLEM TO BE SOLVED: To generate a statistical language model capable of enhancing the precision of speech recognition with respect to an unregistered word in a word dictionary and identifying the domain and class of the unregistered word. SOLUTION: An unregistered word model generating section 20 assures that the ratio of the number of words to a mora length in learning data is practically defined as a gamma distribution and estimates and computes the parameters of the gamma distribution of mora lengths while depending on classes, computes the appearance probability of first N-gram which has the class that is a low- order class of a proper noun or a common noun of an adopted word in a subword unit that is mora or a mora link and generates a subword unit N-gram model which is made by modeling word series including unregistered words. A language model generating section 24 generates a statistical language model including unregistered words based on the subword unit based on the word class N-gram model and the subword unit N-gram model and the parameters of a gamma distribution of a mora length.
机译:要解决的问题:生成能够提高针对单词词典中未注册单词的语音识别精度并识别未注册单词的域和类别的统计语言模型。解决方案:未注册词模型生成部分20确保将学习数据中的词数与莫拉长度的比值实际定义为伽马分布,并根据类别估算和计算莫拉长度的伽马分布的参数,计算第一个N-gram的出现概率,该类的第一个N-gram是在mora或mora链接的子词单元中作为专有名词或被采纳单词的普通名词的低阶类,并生成N-gram子词单元通过对包括未注册单词在内的单词系列进行建模而形成的gram模型。语言模型生成部24基于基于单词类别N元语法模型和子单词单位N语法模型的子单词单元以及莫拉长度的伽马分布的参数,生成包括未注册单词的统计语言模型。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号