This paper is motivated by the observation that not all adjectives in Chinese have a canonical antonym. For example, most Chinese speakers choose to translate the English word dishonest into a word string bu chengshi 'not honest' instead of any antonym candidates of chengshi suggested in antonym dictionaries. Our discourse evidence from corpus data suggests that bu chengshi is evolving into a word in discourse at a faster pace than some other 'bu + adjective' strings, and this may result from the lexical gap for a canonical antonym of chengshi and the communicative need for such a word. As a consequence, it is proposed that if the lexicalization process of bu chengshi continues in the future, the string may need to be considered a single word in a segmentation system (i.e., buchengshi 'dishonest'). For a segmentation system to distinguish between words and phrases, discourse factors should be taken into consideration.
展开▼
机译:本文的动机是观察到并非所有中文形容词都有一个规范的反义词。例如,大多数说汉语的人选择将不诚实的英语单词翻译成“不诚实”的单词字符串,而不是反义词词典中建议的任何“ chengshi”反义词候选词。我们从语料库数据中获得的话语证据表明,bu chengshi正以比其他“ bu +形容词”字符串更快的速度演变为话语,这可能是因为chengshi的反义词在词汇上存在空白以及对这样的话。结果,提出了如果将来继续进行bu chengshi的词法化过程,则在分割系统中可能需要将字符串视为单个单词(即buchengshi的“不诚实”)。为了使分割系统能够区分单词和短语,应考虑话语因素。
展开▼