首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >A Novel Schema-Oriented Approach for Chinese New Word Identification
【24h】

A Novel Schema-Oriented Approach for Chinese New Word Identification

机译:一种面向图式的中文新词识别方法

获取原文

摘要

With the popularity of network application-s, new words become more common and bring the poor performance of natural language processing related applications including web search. Identifying new words automatically from texts is still a very challenging problem, especially for Chinese. In this paper, we propose a novel schema-oriented approach for Chinese new word i-dentification (named "ChNWI"). This approach has three main steps: (1) we suggest three composition schemas that cover nearly all two-character up to four-character Chinese word surfaces; (2) we employ support vector machine (SVM) to classify Chinese new words of three schemas using their u-nique linguistic characteristics; and (3) we design various rules to filter identified Chinese new words of three schemas. Our extensive evaluations with two corpora (Chinese news titles and CIPS-SIGHAN 2012 CSMB) show ChNWI's efficiency on Chinese new word identification.
机译:随着网络应用程序的普及,新单词变得越来越普遍,并且带来了与自然语言处理相关的应用程序(包括网络搜索)的不良性能。从文本自动识别新词仍然是一个非常具有挑战性的问题,尤其是对于中文而言。在本文中,我们提出了一种新颖的面向模式的方法,用于中文新词i-identification(名为“ ChNWI”)。这种方法包括三个主要步骤:(1)我们提出了三种组成模式,它们涵盖了几乎所有两个字符到四个字符的中文单词表面; (2)我们采用支持向量机(SVM)根据它们的u-nique语言特征对三种模式的汉语新词进行分类。 (3)设计各种规则来过滤已识别的三个图式的中文新词。我们使用两个语料库(中文新闻标题和CIPS-SIGHAN 2012 CSMB)进行了广泛的评估,显示了ChNWI在中文新词识别上的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号