首页> 外文会议>Proceedings of 2010 4th International Universal Communication Symposium >Kana-to-kanji conversion method using Markov chain model of words in bunsetsu
【24h】

Kana-to-kanji conversion method using Markov chain model of words in bunsetsu

机译:利用文集词的马尔可夫链模型进行假名到汉字转换方法

获取原文

摘要

We previously proposed a kana-to-kanji conversion method of non-segmented kana sentences by using Markov chain model of words in sentence. However, we could not obtain the enough accuracy rate for conversion by this method. The cause is considered that the total number of the rules is not saturated in the dictionary of Markov chain probabilities of words in sentence. Therefore, we take notice that the total number of the rules is almost saturated in the dictionary of Markov chain probabilities of words in bunsetsu. In this paper, we propose a new kana-to-kanji conversion method by using this Markov chain model. That is, the new proposed method detects simultaneously the boundaries of kana bunsetsu in sentence and the boundaries of kana word in bunsetsu by using Markov chain model of kana words in bunsetsu, and then converts kana words to the candidates of kanji-kana word and selects the maximum likely candidate by using Markov chain model of kanji-kana words in bunsetsu. Through the experiments by using statistical data of daily Japanese newspaper, the previous proposed method (called Method-B1) and the new proposed method (called Method-B2) are evaluated by the criteria of the accuracy rate for conversion. From the results of the experiments, it is concluded that Method-B2 is superior to Method-B1 in the accuracy rate for conversion and is effective in kana-to-kanji conversion of non-segmented kana sentences.
机译:我们先前使用句子中单词的马尔可夫链模型提出了一种非分段假名句子的假名到汉字转换方法。但是,我们无法通过这种方法获得足够的准确率来进行转换。认为原因是规则的总数在句子中单词的马尔可夫链概率的字典中不饱和。因此,我们注意到在Bunsetsu中单词的马尔可夫链概率字典中,规则的总数几乎已饱和。在本文中,我们提出了一种使用此马尔可夫链模型的新的假名到汉字转换方法。也就是说,该新方法利用文集中假名单词的马尔可夫链模型,同时检测句子中假名的边界和文集中假名的边界,然后将假名单词转换为汉字假名的候选词并选择通过使用Bunsetsu中的汉字假名单词的马尔可夫链模型来获得最大可能候选者。通过使用日报的统计数据进行的实验,以转换的准确率为标准对先前提出的方法(称为方法B1)和新提出的方法(称为方法B2)进行了评估。从实验结果可以得出结论,方法-B2在转换的准确率方面优于方法-B1,并且在非分段假名句子的假名到汉字转换中是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号