首页> 外文会议>7th International Conference on Natural Language Processing and Knowledge Engineering >Using context and semantic resources for cross-domain word Segmentation
【24h】

Using context and semantic resources for cross-domain word Segmentation

机译:使用上下文和语义资源进行跨域分词

获取原文
获取原文并翻译 | 示例

摘要

Chinese word Segmentation (CWS) plays a fundamental role in Chinese language processing, because almost all Chinese language processing tasks are assumed to work with segmented input. After active research for many years, most of reports from evaluation tasks always give impressive results. But most of them are limited to testing corpora on specific area. Once used on another different domain, the accuracy will plummet. Thus, the domain-adaptive word segmentation is introduced into Bakeoffs. In this paper, we propose a new joint decoding strategy that combines the character-based and word-based conditional random field model, which takes the part-of-speech of words in dictionary as important features in a segment path. Moreover, according to the characteristics of the cross-domain segmentation, context information is reasonably used to guide CWS. Besides, because there are similar contexts among synonyms, semantic information can be used to recall some out-of-vocabularies (OOVs). This method is proven to be effective through several experiments on the simplified Chinese test data from SIGHAN Bakeoff 2010. Except for the domain of literature, the F-scores are higher than the best performance of the corresponding open test. In addition, the rate of OOV recall reaches 70.7%, 84.3%, 79.0% and 86.2%, respectively.
机译:中文分词(CWS)在中文处理中起着根本性的作用,因为几乎所有中文处理任务都假定与分段输入一起使用。经过多年的积极研究,评估任务中的大多数报告始终给出令人印象深刻的结果。但是它们中的大多数仅限于在特定区域上测试语料库。一旦在另一个不同的域上使用,准确性将直线下降。因此,将域自适应单词分割引入到Bakeoffs中。在本文中,我们提出了一种新的联合解码策略,该策略结合了基于字符和基于单词的条件随机场模型,该模型将字典中单词的词性作为片段路径中的重要特征。此外,根据跨域分割的特点,可以合理地使用上下文信息来指导CWS。此外,由于同义词之间存在相似的上下文,因此可以使用语义信息来召回某些词汇(OOV)。通过对SIGHAN Bakeoff 2010的简体中文测试数据进行的多次实验证明,该方法是有效的。除文献领域外,F得分高于相应开放测试的最佳性能。此外,OOV召回率分别达到70.7%,84.3%,79.0%和86.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号