首页> 外文期刊>IEICE transactions on information and systems >Correcting Syntactic Annotation Errors Based on Tree Mining
【24h】

Correcting Syntactic Annotation Errors Based on Tree Mining

机译:基于树挖掘的语法注释错误纠正

获取原文
           

摘要

This paper provides a new method to correct annotation errors in a treebank. The previous error correction method constructs a pseudo parallel corpus where incorrect partial parse trees are paired with correct ones, and extracts error correction rules from the parallel corpus. By applying these rules to a treebank, the method corrects errors. However, this method does not achieve wide coverage of error correction. To achieve wide coverage, our method adopts a different approach. In our method, we consider that if an infrequent pattern can be transformed to a frequent one, then it is an annotation error pattern. Based on a tree mining technique, our method seeks such infrequent tree patterns, and constructs error correction rules each of which consists of an infrequent pattern and a corresponding frequent pattern. We conducted an experiment using the Penn Treebank. We obtained 1,987 rules which are not constructed by the previous method, and the rules achieved good precision.
机译:本文提供了一种纠正树库中标注错误的新方法。先前的纠错方法构造了一个伪并行语料库,其中不正确的部分解析树与正确的部分解析树配对,并从并行语料库中提取纠错规则。通过将这些规则应用于树库,该方法可以纠正错误。但是,这种方法不能实现广泛的纠错。为了实现广泛的覆盖范围,我们的方法采用了不同的方法。在我们的方法中,我们认为如果不频繁的模式可以转换为频繁的模式,那么它就是注释错误模式。基于树挖掘技术,我们的方法寻找这种不常见的树模式,并构造纠错规则,每个规则由不频繁的模式和相应的频繁模式组成。我们使用Penn Treebank进行了实验。我们获得了1,987条没有用以前的方法构造的规则,这些规则达到了很好的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号