首页> 外文会议>Proceedings of the Ninth International Conference on Machine Learning and Cybernetics >A Chinese word segmentation algorithm based on maximum entropy
【24h】

A Chinese word segmentation algorithm based on maximum entropy

机译:基于最大熵的中文分词算法

获取原文

摘要

Automatic word segmentation technology is an important component part of modern Chinese information processing. It is the key technology of the Chinese full-text retrieval. This paper presents a Chinese word segmentation algorithm based on maximum entropy. It uses of part-of-speech tagging and word frequency tagging of corpus to establish maximum entropy model based on mutual information as a word segmentation language model to make word segmentation. At last, the binary model was used to test whether the expansion of the training corpus may impact the word segmentation accuracy, and the relationship curves between the expansion of training corpus and the word segmentation accuracy was obtained.
机译:自动分词技术是现代汉语信息处理的重要组成部分。它是中文全文检索的关键技术。提出了一种基于最大熵的中文分词算法。它利用语料库的词性标注和词频标注建立基于互信息的最大熵模型作为词分割语言模型进行词分割。最后,采用二元模型检验训练语料库的扩展是否会影响分词精度,并得到训练语料库的扩展与分词精度之间的关系曲线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号