首页> 中文期刊> 《郑州大学学报(理学版)》 >基于最大熵模型的词位标注汉语分词

基于最大熵模型的词位标注汉语分词

         

摘要

The performance of Chinese word segmentation has been greatly improved by word-position-based approaches in recent years.This approach treated Chinese word segmentation as a word-position tagging.With the help of powerful sequence tagging model, word-position-based method quickly rose as a mainstream technique in this field.Feature template selection and tag sets selection was crucial in this method.The technique was studied via using different word-positions tag sets and maximum entropy model.Closed evaluations were performed on corpus from the second international Chinese word segmentation Bakeoff-2005, and comparative experiments were performed on different tag sets and feature templates.Experimental results showed that the feature template set TMPT-6 and six word-position tag sets was much better than the other.%近年来基于字的词位标注汉语分词方法极大地提高了分词的性能,该方法将汉语分词转化为字的词位标注问题,借助于优秀的序列标注模型,词位标注汉语分词逐渐成为汉语分词的主要技术路线.该方法中特征模板集设定和词位标注集的选择至关重要,采用不同的词位标注集,使用最大熵模型进一步研究了词位标注汉语分词技术.在国际汉语分词评测Bakeoff2005的语料上进行了封闭测试,并对比了不同词位标注集对分词性能的影响.实验表明所采用的六词位标注集配合相应的特征模板集TMPT-6较其他词位标注集分词性能要好.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号