首页> 外国专利> METHOD AND APPARATUS FOR CREATING A LANGUAGE MODEL AND KANA-KANJI CONVERSION

METHOD AND APPARATUS FOR CREATING A LANGUAGE MODEL AND KANA-KANJI CONVERSION

机译:创建语言模型和假名汉字转换的方法和装置

摘要

Method for creating a language model capable of preventing deterioration of quality caused by the conventional back-off to unigram. Parts-of-speech with the same display and reading are obtained from a storage device (206). A cluster (204) is created by combining the obtained parts-of-speech. The created cluster (204) is stored in the storage device (206). In addition, when an instruction (214) for dividing the cluster is inputted, the cluster stored in the storage device (206) is divided (210) in accordance with to the inputted instruction (212). Two of the clusters stored in the storage device are combined (218), and a probability of occurrence of the combined clusters in the text corpus is calculated (222). The combined cluster is associated with the bigram indicating the calculated probability and stored into the storage device.
机译:用于创建语言模型的方法,该模型能够防止传统的unigram退缩引起的质量下降。从存储装置(206)获得具有相同显示和读数的词性。通过组合获得的词性来创建聚类(204)。创建的集群(204)被存储在存储设备(206)中。另外,当输入用于划分集群的指令(214)时,根据输入的指令(212)来划分(210)存储在存储设备(206)中的集群。组合(218)存储在存储设备中的两个聚类,并且计算组合的聚类在文本语料库中的出现概率(222)。组合的簇与指示计算出的概率的二元组相关联并存储到存储设备中。

著录项

  • 公开/公告号KR101279676B1

    专利类型

  • 公开/公告日2013-06-27

    原文格式PDF

  • 申请/专利权人

    申请/专利号KR20077030209

  • 申请日2006-06-23

  • 分类号G06F17/28;

  • 国家 KR

  • 入库时间 2022-08-21 16:24:55

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号