首页> 外文会议>International Conference on Machine Learning and Cybernetics >A Chinese word segmentation algorithm based on maximum entropy
【24h】

A Chinese word segmentation algorithm based on maximum entropy

机译:一种基于最大熵的文字分割算法

获取原文

摘要

Automatic word segmentation technology is an important component part of modern Chinese information processing. It is the key technology of the Chinese full-text retrieval. This paper presents a Chinese word segmentation algorithm based on maximum entropy. It uses of part-of-speech tagging and word frequency tagging of corpus to establish maximum entropy model based on mutual information as a word segmentation language model to make word segmentation. At last, the binary model was used to test whether the expansion of the training corpus may impact the word segmentation accuracy, and the relationship curves between the expansion of training corpus and the word segmentation accuracy was obtained.
机译:自动词分割技术是现代中文信息处理的重要组成部分。它是中国全文检索的关键技术。本文介绍了基于最大熵的文字分割算法。它使用语音部分的词性标记和词汇标记来基于相互信息作为单词分段语言模型来建立最大熵模型,以制作字分割。最后,使用二进制模型来测试培训语料库的扩展是否可能影响字分割精度,并且获得了训练语料库的扩展与词分割精度之间的关系曲线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号