首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >Lexical Gaps and Lexicalization: Implications for Word Segmentation Systems for Chinese NLP
【24h】

Lexical Gaps and Lexicalization: Implications for Word Segmentation Systems for Chinese NLP

机译:词汇空白与词汇化:对中文自然语言处理分词系统的启示

获取原文

摘要

This paper is motivated by the observation that not all adjectives in Chinese have a canonical antonym. For example, most Chinese speakers choose to translate the English word dishonest into a word string bu chengshi 'not honest' instead of any antonym candidates of chengshi suggested in antonym dictionaries. Our discourse evidence from corpus data suggests that bu chengshi is evolving into a word in discourse at a faster pace than some other 'bu + adjective' strings, and this may result from the lexical gap for a canonical antonym of chengshi and the communicative need for such a word. As a consequence, it is proposed that if the lexicalization process of bu chengshi continues in the future, the string may need to be considered a single word in a segmentation system (i.e., buchengshi 'dishonest'). For a segmentation system to distinguish between words and phrases, discourse factors should be taken into consideration.
机译:本文的动机是观察到并非所有中文形容词都有一个规范的反义词。例如,大多数说汉语的人选择将不诚实的英语单词翻译成“不诚实”的单词字符串,而不是反义词词典中建议的任何“ chengshi”反义词候选词。我们从语料库数据中获得的话语证据表明,bu chengshi正以比其他“ bu +形容词”字符串更快的速度演变为话语,这可能是因为chengshi的反义词在词汇上存在空白以及对这样的话。结果,提出了如果将来继续进行bu chengshi的词法化过程,则在分割系统中可能需要将字符串视为单个单词(即buchengshi的“不诚实”)。为了使分割系统能够区分单词和短语,应考虑话语因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号