首页> 外文会议>Proceedings of the 2007 International Conference on Artificial Intelligence(ICAI'2007) >Improving Chinese Word Segmentation with Description Length Gain
【24h】

Improving Chinese Word Segmentation with Description Length Gain

机译:通过描述长度增益改善中文分词

获取原文

摘要

Supervised and unsupervised learning has seldom joined with and thus lend strength to each other in the field of Chinese word segmentation (CWS). This paper presents a novel approach to CWS that utilizes description length gain (DLG), an empirical goodness measure for unsupervised word discovery, to enhance the segmentation performance of conditional random field (CRF) learning. Specifically, we attempt to integrate the lexical information acquired from the unsupervised DLG segmentation into the supervised CRF learning of character tagging for CWS. Our experimental results show that the CRF learning can be further improved on top of its state-of-the-art performance in the field by making good use of DLG information.
机译:有监督和无监督的学习很少在中文分词(CWS)领域中相互结合,因此彼此具有优势。本文提出了一种新的CWS方法,该方法利用描述长度增益(DLG)(一种用于无监督单词发现的经验性度量)来增强条件随机字段(CRF)学习的分割性能。具体来说,我们尝试将从无监督DLG分割中获取的词汇信息集成到CWS的字符标记的有监督CRF学习中。我们的实验结果表明,通过充分利用DLG信息,CRF学习可以在其在该领域的最新性能的基础上得到进一步改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号