首页> 中文期刊>中文信息学报 >一种基于EM非监督训II练的自组织分词歧义解决方案

一种基于EM非监督训II练的自组织分词歧义解决方案

     

摘要

摘要:本文旨在提供一种基于非监督训练的分词歧义解决方案和一种分词算法。基于EM的思想,每个句子所对应的所有(或一定范围内)的分词结果构成训练集,通过这个训练集和初始的语言模型可以估计出一个新的语言模型。最终的语言模型通过多次迭代而得到。通过一种基于该最终语言模型的统计分词算法,对于每个句子至少带有一个歧义的测试集的正确切分精度达到85.36%(以句子为单位)。%This paper is mainly to present a word segmentation ambiguity resolution scheme based on unsupervised training. According to the idea of EM, a language model is built increasingly by collection the fractional counts of patterns (such as bigram pair)from the augmentations of all the segmentation candidates of a sentence. The learned language model is incorporated into a statistical segmentor. Experiments show that this scheme can resolve 85.36 96 ambiguity on test set each sentence of which has at least one ambiguous part(and the accuracy rate is based on sentence).

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号