首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >A Novel Schema-Oriented Approach for Chinese New Word Identification
【24h】

A Novel Schema-Oriented Approach for Chinese New Word Identification

机译:中国新单词识别的新颖的面向架构的方法

获取原文

摘要

With the popularity of network application-s, new words become more common and bring the poor performance of natural language processing related applications including web search. Identifying new words automatically from texts is still a very challenging problem, especially for Chinese. In this paper, we propose a novel schema-oriented approach for Chinese new word i-dentification (named "ChNWI"). This approach has three main steps: (1) we suggest three composition schemas that cover nearly all two-character up to four-character Chinese word surfaces; (2) we employ support vector machine (SVM) to classify Chinese new words of three schemas using their u-nique linguistic characteristics; and (3) we design various rules to filter identified Chinese new words of three schemas. Our extensive evaluations with two corpora (Chinese news titles and CIPS-SIGHAN 2012 CSMB) show ChNWI's efficiency on Chinese new word identification.
机译:随着网络应用程序的普及,新词变得更加常见,并使自然语言处理相关应用程序的性能差,包括网络搜索。 从文本中自动识别新单词仍然是一个非常具有挑战性的问题,特别是对于中国人。 在本文中,我们提出了一种新颖的中国新词I-DENTIFIENT(命名为“CHNWI”)的新颖型号。 这种方法有三个主要步骤:(1)我们建议三个构图模式,几乎所有两个字符最多覆盖四个字符的汉字表面; (2)我们采用支持向量机(SVM)使用他们的U-Nique语言特征来分类三个模式的新词; (3)我们设计各种规则来过滤确定的三个模式的中国新词。 我们与两种Corpora(中国新闻标题和CIPS-Sighan 2012 CSMB)的广泛评估显示了CHNWI对中国新单词识别的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号