首页> 外文会议>International Conference on Natural Language Processing and Knowledge Engineering >Using context and semantic resources for cross-domain word Segmentation
【24h】

Using context and semantic resources for cross-domain word Segmentation

机译:使用对跨域字分割的上下文和语义资源

获取原文

摘要

Chinese word Segmentation (CWS) plays a fundamental role in Chinese language processing, because almost all Chinese language processing tasks are assumed to work with segmented input. After active research for many years, most of reports from evaluation tasks always give impressive results. But most of them are limited to testing corpora on specific area. Once used on another different domain, the accuracy will plummet. Thus, the domain-adaptive word segmentation is introduced into Bakeoffs. In this paper, we propose a new joint decoding strategy that combines the character-based and word-based conditional random field model, which takes the part-of-speech of words in dictionary as important features in a segment path. Moreover, according to the characteristics of the cross-domain segmentation, context information is reasonably used to guide CWS. Besides, because there are similar contexts among synonyms, semantic information can be used to recall some out-of-vocabularies (OOVs). This method is proven to be effective through several experiments on the simplified Chinese test data from SIGHAN Bakeoff 2010. Except for the domain of literature, the F-scores are higher than the best performance of the corresponding open test. In addition, the rate of OOV recall reaches 70.7%, 84.3%, 79.0% and 86.2%, respectively.
机译:中文字分割(CWS)在中文处理中发挥着基本作用,因为假设几乎所有中文处理任务都与分段输入一起使用。在积极研究多年后,评估任务的大多数报告总是给予令人印象深刻的结果。但大多数人仅限于测试具体区域的Corpora。一旦在另一个不同的域上使用,准确性会略微垂直。因此,将域 - 自适应词分段引入凹陷。在本文中,我们提出了一种新的联合解码策略,该策略结合了基于字符和基于Word的条件随机字段模型,它将字典中的词语分成称为段路径中的重要特征。此外,根据跨域分割的特征,上下文信息合理地用于引导CWS。此外,由于同义词之间存在类似的上下文,因此可以使用语义信息来调用一些失败的词汇(OOV)。经过证明该方法通过来自Sighan Bakeoff 2010的简体中文测试数据的几个实验证明是有效的。除了文献领域,F分数高于相应的开放测试的最佳性能。此外,OOV召回的率分别达到70.7%,84.3%,79.0%和86.2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号