首页> 外文会议>7th International Conference on Natural Language Processing and Knowledge Engineering >Parsing-based Chinese word segmentation integrating morphological and syntactic information
【24h】

Parsing-based Chinese word segmentation integrating morphological and syntactic information

机译:结合形态和句法信息的基于解析的中文分词

获取原文
获取原文并翻译 | 示例

摘要

The conventional sequence labeling methods for Chinese word segmentation do not fully utilize the linguistic information, which restricts further improvements of the performance. Chinese morphology intensively investigates the constructions and usages of Chinese words, which is helpful to Chinese word segmentation. Furthermore, some word segmentation ambiguities cannot be resolved only by means of the lexical information, and the final disambiguations take place in the parsing process. In this paper, we propose a parsing-based Chinese word segmentation model, which can fully utilize the morphological and syntactic information. Experiments on Penn Chinese Treebank(CTB) 5.0 show that the proposed model obtains competitive performances as the CRFs-based model. To investigate the relationship between our parsing-based model and the CRFs-based model, a maximum entropy model based framework for integrating different knowledge sources is employed. The integrating model obtains an F-measure of 97.9, 25% in segmentation error rate reduction relative to the CRFs-based model, which indicates that the two models are complementary to each other.
机译:传统的中文分词序列标记方法不能充分利用语言信息,从而限制了性能的进一步提高。汉语形态学对汉语单词的结构和用法进行了深入研究,有助于汉语单词的切分。此外,某些单词分割歧义性不能仅通过词法信息来解决,最终歧义歧义发生在解析过程中。本文提出了一种基于解析的中文分词模型,该模型可以充分利用词法和句法信息。在Penn Chinese Treebank(CTB)5.0上进行的实验表明,该模型作为基于CRFs的模型获得了竞争优势。为了研究基于解析的模型与基于CRF的模型之间的关系,采用了基于最大熵模型的框架来集成不同的知识源。集成模型获得的F度量为97.9,相对于基于CRF的模型,分割错误率降低了25%,这表明这两个模型是相互补充的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号