...
首页> 外文期刊>Language Resources and Evaluation >Ensuring annotation consistency and accuracy for Vietnamese treebank
【24h】

Ensuring annotation consistency and accuracy for Vietnamese treebank

机译:确保越南树库的注释一致性和准确性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Treebanks are important resources for researchers in natural language processing. They provide training and testing materials so that different algorithms can be compared. However, it is not a trivial task to construct high-quality treebanks. We have not yet had a proper treebank for such a low-resource language as Vietnamese, which has probably lowered the performance of Vietnamese language processing. We have been building a consistent and accurate Vietnamese treebank to alleviate such situations. Our treebank is annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. We developed detailed annotation guidelines for each layer by presenting Vietnamese linguistic issues as well as methods of addressing them. Here, we also describe approaches to controlling annotation quality while ensuring a reasonable annotation speed. We specifically designed an appropriate annotation process and an effective process to train annotators. In addition, we implemented several support tools to improve annotation speed and to control the consistency of the treebank. The results from experiments revealed that both inter-annotator agreement and accuracy were higher than 90%, which indicated that the treebank is reliable.
机译:树库是自然语言处理中研究人员的重要资源。他们提供培训和测试材料,以便可以比较不同的算法。但是,构建高质量的树库并不是一件容易的事。对于像越南语这样的资源匮乏的语言,我们还没有合适的树库,这可能降低了越南语处理的性能。我们一直在建立一个一致而准确的越南树库来缓解这种情况。我们的树库被注释为三层:分词,词性标记和括号。我们通过介绍越南语的语言问题以及解决这些问题的方法,为每一层制定了详细的注释准则。在这里,我们还描述了在确保合理注释速度的同时控制注释质量的方法。我们专门设计了适当的注释过程和有效的过程来训练注释者。此外,我们实施了多种支持工具,以提高注释速度并控制树库的一致性。实验结果表明,注释者之间的一致性和准确性均高于90%,这表明树库是可靠的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号