Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation

机译：利用丰富的语言特征，为跨域中文分割

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper describes the system that we use for Chinese segmentation task in the 3rd CIPS-SIGHAN bakeoff. We use character sequence labeling method for segmentation, and in order to improve segmentation accuracy over multi-domain, we present a CRF-based Chinese segmentation system integrating supervised, un-supervised and lexical features. We firstly preliminarily segment the target data using CRF model trained over three types of features mentioned above, from the result of which new words are detected and absorbed into the lexicon. To generalize across different domains, we then execute the second segment with the updated lexicon. The OOV recognition is further promoted with refined post processing. All the features we used share a unified feature template trained by CRF. Our system achieves a competitive F score of 0.9730 for this bakeoff.

机译：本文介绍了我们在第三次CIPS-Sighan BAKEOFF中使用中文分段任务的系统。我们使用字符序列标记方法进行分割，以提高多域的分割精度，我们展示了基于CRF的中文分段系统，整合了监督，未经监督和词汇特征。我们首先使用上面提到的三种类型的特征进行了培训的CRF模型初步分割目标数据，从结果中检测到新单词并被吸收到词典中。要在不同的域中概括，我们将使用更新的Lexicon执行第二个段。通过精细的后处理进一步促进OOV识别。我们使用的所有功能共享CRF培训的统一功能模板。我们的系统实现了竞争对手的F分数为0.9730。

著录项

来源
《CIPS-SIGHAN joint conference on Chinese language processing》|2012年||共7页
会议地点
作者
Guohua Wu; Dezhu He; Keli Zhong; Xue Zhou; Caixia Yuan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类汉语;
关键词

相似文献

外文文献
中文文献
专利

1. Accurate and efficient cross-domain visual matching leveraging multiple feature representations [J] . Gang Sun, Shuhui Wang, Xuehui Liu, The Visual Computer . 2013,第6a8期

机译：利用多种特征表示，进行准确，高效的跨域视觉匹配
2. Market segmentation for a leverage revitalization of China's inbound tourism: the case of US leisure tourists [J] . Qu Ying, Qu Hailin, Chen Ganghua Current issues in tourism . 2018,第1a6期

机译：细分市场以振兴中国入境旅游业：以美国休闲游客为例
3. Improved Arabic–Chinese Machine Translation with Linguistic Input Features [J] . Fares Aqlan, Xiaoping Fan, Abdullah Alqwbani, Future Internet . 2019,第1期

机译：具有语言输入功能的改进的阿拉伯语-中文机器翻译
4. Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation [C] . Guohua Wu, Dezhu He, Keli Zhong, CIPS-SIGHAN joint conference on Chinese language processing . 2014

机译：利用丰富的语言功能进行跨域中文分割
5. Abstract Meaning Representation Parsing with Rich Linguistic Features [D] . Chen, Wei-Te. 2017

机译：具有丰富语言特征的抽象意义表示分析
6. Learning rich features with hybrid loss for brain tumor segmentation [O] . Daobin Huang, Minghui Wang, Ling Zhang, 2021

机译：学习具有脑肿瘤细分的杂种损失的富含特征
7. Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation [O] . Guohua Wu, Dezhu He, Keli Zhong, 2015

机译：利用丰富的语言特征进行跨域中文分词

Leveraging Rich Linguistic Features for Cross-domain Chinese Segmentation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅