首页> 外文会议>CIPS-SIGHAN joint conference on Chinese language processing >Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation
【24h】

Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation

机译:利用丰富的语言功能进行跨域中文分割

获取原文

摘要

This paper describes the system that we use for Chinese segmentation task in the 3rd CIPS-SIGHAN bakeoff. We use character sequence labeling method for segmentation, and in order to improve segmentation accuracy over multi-domain, we present a CRF-based Chinese segmentation system integrating supervised, un-supervised and lexical features. We firstly preliminarily segment the target data using CRF model trained over three types of features mentioned above, from the result of which new words are detected and absorbed into the lexicon. To generalize across different domains, we then execute the second segment with the updated lexicon. The OOV recognition is further promoted with refined post processing. All the features we used share a unified feature template trained by CRF. Our system achieves a competitive F score of 0.9730 for this bakeoff.
机译:本文介绍了在第三次CIPS-SIGHAN审核中用于中文细分任务的系统。我们使用字符序列标记方法进行分割,为了提高在多域上的分割精度,我们提出了一种基于CRF的中文分割系统,该系统集成了监督,非监督和词汇功能。我们首先使用在上述三种特征上训练过的CRF模型对目标数据进行初步分割,从中发现新词并将其吸收到词典中。为了跨不同领域进行概括,我们然后使用更新的词典执行第二段。精细的后处理进一步提高了OOV的识别度。我们使用的所有功能共享由CRF训练的统一功能模板。我们的系统在此次烘烤中获得了竞争性F得分0.9730。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号