...
首页> 外文期刊>ACM transactions on Asian language information processing >A Unified Model for Solving the OOV Problem of Chinese Word Segmentation
【24h】

A Unified Model for Solving the OOV Problem of Chinese Word Segmentation

机译:解决中文分词OOV问题的统一模型

获取原文
获取原文并翻译 | 示例
           

摘要

This article proposes a unified, character-based, generative model to incorporate additional resources for solving the out-of-vocabulary (OOV) problem of Chinese word segmentation, within which different types of additional information can be utilized independently in corresponding submodels. This article mainly addresses the following three types of OOV: unseen dictionary words, named entities, and suffix-derived words, none of which are handled well by current approaches. The results show that our approach can effectively improve the performance of the first two types with positive interaction in F-score. Additionally, we also analyze reason that suffix information is not helpful. After integrating the proposed generative model with the corresponding discriminative approach, our evaluation on various corpora-including SIGHAN-2005, CIPS-SIGHAN-2010, and the Chinese Treebank (CTB)-shows that our integrated approach achieves the best performance reported in the literature on all testing sets when additional information and resources are allowed.
机译:本文提出了一个统一的,基于字符的生成模型,该模型包含用于解决中文分词的词外(OOV)问题的其他资源,其中可以在相应的子模型中独立利用不同类型的附加信息。本文主要介绍以下三种OOV类型:看不见的字典单词,命名实体和后缀派生的单词,当前方法都无法很好地处理它们。结果表明,我们的方法可以有效改善前两种类型的性能,并且在F评分中具有正向交互作用。此外,我们还分析了后缀信息无用的原因。在将提出的生成模型与相应的判别方法相结合之后,我们对各种语料库(包括SIGHAN-2005,CIPS-SIGHAN-2010和中国树库(CTB))的评估显示,我们的集成方法取得了文献报道的最佳性能在允许附加信息和资源的情况下在所有测试集中进行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号