首页> 外文会议>Knowledge engineering and management by the masses >A Methodology towards Effective and Efficient Manual Document Annotation: Addressing Annotator Discrepancy and Annotation Quality
【24h】

A Methodology towards Effective and Efficient Manual Document Annotation: Addressing Annotator Discrepancy and Annotation Quality

机译:一种有效和高效的手动文档注释方法:解决注释者的差异和注释质量

获取原文
获取原文并翻译 | 示例

摘要

Manual document annotation is an essential technique for knowledge acquisition and capture. Creating high-quality annotations is a difficult task due to inter-annotator discrepancy, the problem that annotates can never agree completely on what and exactly how to annotate. To address this, traditional document annotation involves multiple domain experts working on the same annotation task in an iterative and collaborative manner to identify and resolve discrepancies progressively. However, such a detailed process is often ineffective despite taking significant time and effort; unfortunately, discrepancies remain high in many cases. This paper proposes an alternative approach to document annotation. The approach tackles the problem by firstly studying annotators' suitability based on the types of information to be annotated; then identifying and isolating the most inconsistent annotators who tend to cause the majority of discrepancies in a task; finally distributing annotation workload among the most suitable annotators. Tested in a named entity annotation task in the domain of archaeology, we show that compared to the traditional approach to document annotation, it produces larger amounts of better quality annotations that result in higher machine learning accuracy while requires significantly less time and effort.
机译:手动文档注释是获取和捕获知识的一项必不可少的技术。由于批注者之间的差异,创建高质量批注是一项艰巨的任务,批注的问题永远无法就批注的内容和方式完全达成共识。为了解决这个问题,传统的文档批注涉及多个领域专家以迭代和协作的方式从事同一批注任务,以逐步识别和解决差异。然而,尽管花费大量时间和精力,这种详细的过程通常还是无效的。不幸的是,在许多情况下差异仍然很高。本文提出了一种用于文档注释的替代方法。该方法通过首先基于要注释的信息类型研究注释者的适用性来解决该问题。然后找出并隔离那些最容易引起任务差异最多的注释者;最后在最合适的注释者之间分配注释工作量。在考古领域的命名实体注释任务中进行了测试,结果表明,与传统的文档注释方法相比,它可以产生大量质量更好的注释,从而提高机器学习的准确性,同时所需的时间和精力也大大减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号