首页> 外文会议>CIPS-SIGHAN joint conference on Chinese language processing >Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation
【24h】

Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation

机译:利用丰富的语言特征,为跨域中文分割

获取原文

摘要

This paper describes the system that we use for Chinese segmentation task in the 3rd CIPS-SIGHAN bakeoff. We use character sequence labeling method for segmentation, and in order to improve segmentation accuracy over multi-domain, we present a CRF-based Chinese segmentation system integrating supervised, un-supervised and lexical features. We firstly preliminarily segment the target data using CRF model trained over three types of features mentioned above, from the result of which new words are detected and absorbed into the lexicon. To generalize across different domains, we then execute the second segment with the updated lexicon. The OOV recognition is further promoted with refined post processing. All the features we used share a unified feature template trained by CRF. Our system achieves a competitive F score of 0.9730 for this bakeoff.
机译:本文介绍了我们在第三次CIPS-Sighan BAKEOFF中使用中文分段任务的系统。我们使用字符序列标记方法进行分割,以提高多域的分割精度,我们展示了基于CRF的中文分段系统,整合了监督,未经监督和词汇特征。我们首先使用上面提到的三种类型的特征进行了培训的CRF模型初步分割目标数据,从结果中检测到新单词并被吸收到词典中。要在不同的域中概括,我们将使用更新的Lexicon执行第二个段。通过精细的后处理进一步促进OOV识别。我们使用的所有功能共享CRF培训的统一功能模板。我们的系统实现了竞争对手的F分数为0.9730。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号