【24h】

Chinese Comma Disambiguation for Discourse Analysis

机译:汉语逗号歧义分析

获取原文

摘要

The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structure-oriented classification of the comma that can be automatically extracted from the Chinese Treebank based on syntactic patterns. We then experimented with two supervised learning methods that automatically disambiguate the Chinese comma based on this classification. The first method integrates comma classification into parsing, and the second method adopts a "post-processing" approach that extracts features from automatic parses to train a classifier. The experimental results show that the second approach compares favorably against the first approach.
机译:汉语逗号表示话语单元的边界,并锚定相邻文本范围之间的话语关系。在这项工作中,我们提出了一种基于话语结构的逗号分类法,该分类法可以基于句法模式从中国树库中自动提取出来。然后,我们尝试了两种有监督的学习方法,这些方法会根据此分类自动将中文逗号消除歧义。第一种方法将逗号分类集成到解析中,第二种方法采用“后处理”方法,该方法从自动解析中提取特征以训练分类器。实验结果表明,第二种方法优于第一种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号