首页> 外文会议>9th International conference on language resources and evaluation >Parsing Chinese Synthetic Words with a Character-based Dependency Model
【24h】

Parsing Chinese Synthetic Words with a Character-based Dependency Model

机译:基于字符的依存模型解析汉语合成词

获取原文

摘要

Synthetic word analysis is a potentially important but relatively unexplored problem in Chinese natural language processing. Two issues with the conventional pipeline methods involving word segmentation are (1) the lack of a common segmentation standard and (2) the poor segmentation performance on OOV words. These issues may be circumvented if we adopt the view of character-based parsing, providing both internal structures to synthetic words and global structure to sentences in a seamless fashion. However, the accuracy of synthetic word parsing is not yet satisfactory, due to the lack of research. In view of this, we propose and present experiments on several synthetic word parsers. Additionally, we demonstrate the usefulness of incorporating large unlabelled corpora and a dictionary for this task. Our parsers significantly outperform the baseline (a pipeline method).
机译:在中国自然语言处理中,合成词分析是一个潜在的重要问题,但尚待探讨。涉及单词分割的常规流水线方法的两个问题是:(1)缺乏通用的分割标准;(2)OOV单词的分割性能较差。如果我们采用基于字符的解析的观点,可以无缝地提供合成词的内部结构和句子的全局结构,则可以避免这些问题。然而,由于缺乏研究,合成词解析的准确性还不能令人满意。有鉴于此,我们提出并提出了几种合成词解析器的实验。此外,我们演示了合并大型未标记的语料库和词典以完成此任务的有用性。我们的解析器明显优于基线(管道方法)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号