【24h】

An In-depth Study on Internal Structure of Chinese Words

机译:中国词内部结构深入研究

获取原文

摘要

Unlike English letters, Chinese characters have rich and specific meanings. Usually, the meaning of a word can be derived from its constituent characters in some way. Several previous works on syntactic parsing propose to annotate shallow word-internal structures for better utilizing character-level information. This work proposes to model the deep internal structures of Chinese words as dependency trees with 11 labels for distinguishing syntactic relationships. First, based on newly compiled annotation guidelines, we manually annotate a word-internal structure treebank (WIST) consisting of over 30K multi-char words from Chinese Penn Treebank. To guarantee quality, each word is independently annotated by two annotators and inconsistencies are handled by a third senior annotator. Second, we present detailed and interesting analysis on WIST to reveal insights on Chinese word formation. Third, we propose word-internal structure parsing as a new task, and conduct benchmark experiments using a competitive dependency parser. Finally, we present two simple ways to encode word-internal structures, leading to promising gains on the sentence-level syntactic parsing task.
机译:与英文字母不同,汉字具有丰富和特定的含义。通常,可以以某种方式从其组成字符派生单词的含义。句法解析上的几个先前作品建议注释浅层词内部结构,以便更好地利用字符级信息。这项工作建议将中文单词的深层内部结构模拟为具有11个标签的依赖树,以区分句法关系。首先,基于新编译的注释指南,我们手动注释一个单词内部结构树(WIST),由中国Penn TreeBank的30k多焦点单词组成。为了保证质量,每个单词都是由两个注释器独立注释的,并且第三个高级注释器处理不一致。其次,我们对愿望的详细和有趣的分析揭示了汉字形成的见解。第三,我们将单词内部结构解析为新任务,并使用竞争依赖性解析器进行基准实验。最后,我们提出了两种简单的方法来编码词内部结构,导致句子级语法解析任务的有希望的收益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号