首页> 外文会议>Interntional Conference on Intelligent Computing >Integration of Named Entity Information for Chinese Word Segmentation Based on Maximum Entropy
【24h】

Integration of Named Entity Information for Chinese Word Segmentation Based on Maximum Entropy

机译:基于最大熵的中文字分割的命名实体信息集成

获取原文

摘要

Word segmentation is an essential process in Chinese information processing. Although related researches were reported and made progresses, the Unknown Named Entity (UNE) problem in segmentation is not fully solved. This usually degrades the accuracy of segmentation in general. In this paper, a model to identify UNEs for improving the overall performance of the segmentation is presented. In order to capture the NE information, functions of characters or words are defined with tags. In addition, useful surrounding contexts are collected from a corpus and used as features. The model is constructed based on Maximum Entropy to handle the UNE identification as tagging problem. Empirical experiments show that the overall accuracy of the segmentation is improved after integrating the UNE identification module into the word segmenter.
机译:单词分割是中文信息处理中的重要过程。虽然报告并取得了相关的研究并取得了进展,但分割中未知的命名实体(UNE)问题尚未完全解决。这通常会降低分割的准确性。在本文中,提出了一种用于识别联合国来提高分割的整体性能的模型。为了捕获网元信息,用标记定义字符或单词的功能。此外,有用的周围上下文从语料库中收集并用作特征。该模型基于最大熵构造,以处理UNE识别作为标记问题。经验实验表明,在将UNE识别模块集成到单词分段器之后,分段的整体精度得到改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号