首页> 外国专利> Method and apparatus for creating a language model and kana-kanji conversion

Method and apparatus for creating a language model and kana-kanji conversion

机译:用于创建语言模型和假名汉字转换的方法和设备

摘要

Method for creating a language model capable of preventing deterioration of quality caused by the conventional back-off to unigram. Parts-of-speech with the same display and reading are obtained from a storage device (206). A cluster (204) is created by combining the obtained parts-of-speech. The created cluster (204) is stored in the storage device (206). In addition, when an instruction (214) for dividing the cluster is inputted, the cluster stored in the storage device (206) is divided (210) in accordance with to the inputted instruction (212). Two of the clusters stored in the storage device are combined (218), and a probability of occurrence of the combined clusters in the text corpus is calculated (222). The combined cluster is associated with the bigram indicating the calculated probability and stored into the storage device.
机译:用于创建语言模型的方法,该模型能够防止传统的unigram退缩引起的质量下降。具有相同显示和读数的词性是从存储设备( 206 )获得的。通过组合获得的词性来创建聚类( 204 )。创建的群集( 204 )存储在存储设备( 206 )中。另外,当输入用于划分集群的指令( 214 )时,存储在存储设备( 206 )中的集群被划分( 210 )按照输入的指令( 212 )。将存储在存储设备中的两个聚类进行合并( 218 ),并计算合并后的聚类在文本语料库中的出现概率( 222 )。组合的簇与指示计算出的概率的二元组相关联并存储到存储设备中。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号