首页> 外文会议>Workshop on language technology resources and tools for digital humanities >Automatic parsing as an efficient pre-annotation tool for historical texts

Automatic parsing as an efficient pre-annotation tool for historical texts




Historical treebanks tend to be manually annotated, which is not surprising, since state-of-the-art parsers are not accurate enough to ensure high-quality annotation for historical texts. We test whether automatic parsing can be an efficient pre-annotation tool for Old East Slavic texts. We use the TOROT treebank from the PROIEL treebank family. We convert the PROIEL format to the CONLL format and use MaltParser to create syntactic pre-annotation. Using the most conservative evaluation method, which takes into account PROIEL-specific features, MaltParser by itself yields 0.845 unlabelled attachment score, 0.779 labelled attachment score and 0.741 secondary dependency accuracy (note, though, that the test set comes from a relatively simple genre and contains rather short sentences). Experiments with human annotators show that preparsing, if limited to sentences where no changes to word or sentence boundaries are required, increases their annotation rate. For experienced annotators, the speed gain varies from 5.80% to 16.57%, for inexperienced annotators from 14.61% to 32.17% (using conservative estimates). There are no strong reliable differences in the annotation accuracy, which means that there is no reason to suspect that using preparsing might lower the final annotation quality.
机译:历史树木银行往往是手动注释,这并不奇怪,因为最先进的解析器不够准确,以确保历史文本的高质量注释。我们测试自动解析是否可以是老东斯拉夫文本的有效预注释工具。我们使用Proiel TreeBank系列的Torot TreeBank。我们将proiel格式转换为conll格式并使用MARTPARSER创建语法预注释。使用最保守的评估方法,该方法考虑了特定于Proiel特征,Maltparser本身将产生0.845个未标记的附件得分,0.779标记的附件得分和0.741次级依赖性准确性(但是,测试集来自相对简单的类型和包含相当短的句子)。与人类注释者的实验表明,如果限于要求无需改变单词或句子界限的句子,则可以提高他们的注释率。对于经验丰富的注释器,速度增益从5.80%变化到16.57%,因为缺乏经验的注释器,从14.61%到32.17%(使用保守估计数)。注释准确性没有强大的可靠差异,这意味着没有理由怀疑使用准备可能降低最终的注释质量。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号