Lexical Gaps and Lexicalization: Implications for Word Segmentation Systems for Chinese NLP

机译：词汇空白与词汇化：对中文自然语言处理分词系统的启示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper is motivated by the observation that not all adjectives in Chinese have a canonical antonym. For example, most Chinese speakers choose to translate the English word dishonest into a word string bu chengshi 'not honest' instead of any antonym candidates of chengshi suggested in antonym dictionaries. Our discourse evidence from corpus data suggests that bu chengshi is evolving into a word in discourse at a faster pace than some other 'bu + adjective' strings, and this may result from the lexical gap for a canonical antonym of chengshi and the communicative need for such a word. As a consequence, it is proposed that if the lexicalization process of bu chengshi continues in the future, the string may need to be considered a single word in a segmentation system (i.e., buchengshi 'dishonest'). For a segmentation system to distinguish between words and phrases, discourse factors should be taken into consideration.

机译：本文的动机是观察到并非所有中文形容词都有一个规范的反义词。例如，大多数说汉语的人选择将不诚实的英语单词翻译成“不诚实”的单词字符串，而不是反义词词典中建议的任何“ chengshi”反义词候选词。我们从语料库数据中获得的话语证据表明，bu chengshi正以比其他“ bu +形容词”字符串更快的速度演变为话语，这可能是因为chengshi的反义词在词汇上存在空白以及对这样的话。结果，提出了如果将来继续进行bu chengshi的词法化过程，则在分割系统中可能需要将字符串视为单个单词（即buchengshi的“不诚实”）。为了使分割系统能够区分单词和短语，应考虑话语因素。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation》|2012年|191-198|共8页
会议地点
作者
Chan-Chia Hsu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The use of probabilistic lexicality cues for word segmentation in Chinese reading [J] . Zang Chuanli, Wang Yongsheng, Bai Xuejun, The quarterly journal of experimental psychology: QJEP . 2016,第3期

机译：概率词法线索在汉语阅读中的分词
2. On the effectiveness of subwords for lexical cohesion based story segmentation of Chinese broadcast news [J] . Xie L., Yang Y.-L., Liu Z.-Q. Information Sciences: An International Journal . 2011,第13期

机译：基于词衔接的汉语广播新闻故事分段中子词的有效性
3. Chinese word segmentation as morpheme-based lexical chunking [J] . Fu GH, Kit C, Webster JJ Information Sciences: An International Journal . 2008,第9期

机译：中文分词为基于词素的词法分块
4. Lexical and Semantic Resources for NLP: From Words to Meanings [C] . Anna Lisa Gentile, Pierpaolo Basile, Leo Iaquinta, International Conference on Knowledge-Based Intelligent Information and Engineering Systems;KES 2008 . 2008

机译：NLP的词汇和语义资源：从单词到含义
5. The contribution of phonotactic and lexical information in the segmentation of multi -word utterances. [D] . Shoaf, Lisa Contos. 2002

机译：语音策略和词汇信息在多词话语分割中的作用。
6. Functional Anatomy of Recognition of Chinese Multi-Character Words: Convergent Evidence from Effects of Transposable Nonwords Lexicality and Word Frequency [O] . Nan Lin, Xi Yu, Ying Zhao, -1

机译：汉语多字符词识别的功能解剖：可转位非词词汇性和词频影响的融合证据
7. Lexical Gaps and Lexicalization: Implications for Word Segmentation Systems for Chinese NLP [O] . Hsu Chan-Chia 2012

机译：词汇空白和词汇化：对中文自然语言处理分词系统的启示
8. WORDNET: An Electronic Lexical Reference System Based on Theories of Lexical Memory [R] . Miller, G. A., Fellbaum, C., Kegl, J., 1988

机译：WORDNET：一种基于词汇记忆理论的电子词汇参考系统

Lexical Gaps and Lexicalization: Implications for Word Segmentation Systems for Chinese NLP

摘要

著录项

相似文献

相关主题

期刊订阅