首页> 美国卫生研究院文献>PLoS Computational Biology >How to Get the Most out of Your Curation Effort
【2h】

How to Get the Most out of Your Curation Effort

机译:如何充分利用您的策展努力

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

Large-scale annotation efforts typically involve several experts who may disagree with each other. We propose an approach for modeling disagreements among experts that allows providing each annotation with a confidence value (i.e., the posterior probability that it is correct). Our approach allows computing certainty-level for individual annotations, given annotator-specific parameters estimated from data. We developed two probabilistic models for performing this analysis, compared these models using computer simulation, and tested each model's actual performance, based on a large data set generated by human annotators specifically for this study. We show that even in the worst-case scenario, when all annotators disagree, our approach allows us to significantly increase the probability of choosing the correct annotation. Along with this publication we make publicly available a corpus of 10,000 sentences annotated according to several cardinal dimensions that we have introduced in earlier work. The 10,000 sentences were all 3-fold annotated by a group of eight experts, while a 1,000-sentence subset was further 5-fold annotated by five new experts. While the presented data represent a specialized curation task, our modeling approach is general; most data annotation studies could benefit from our methodology.
机译:大规模注释工作通常涉及可能彼此不同的几位专家。我们提出了一种用于建模专家之间分歧的方法,该方法允许为每个注释提供一个置信度值(即正确性的后验概率)。给定根据数据估算的特定于注释器的参数,我们的方法允许为单个注释计算确定性级别。我们开发了两个用于执行此分析的概率模型,并使用计算机模拟对这些模型进行了比较,并基于人工注释者为该研究专门生成的大量数据测试了每个模型的实际性能。我们表明,即使在最坏的情况下,当所有注释者都不同意时,我们的方法仍使我们能够显着提高选择正确注释的可能性。随本出版物一起,我们公开提供了一个10,000个句子的语料库,该语料库根据我们在早期工作中引入的几个基本维度进行了注释。 10,000个句子全部由一组八位专家进行三重注释,而1,000个句子的子集再由五位新专家进行五重注释。虽然所提供的数据代表一项专门的策展任务,但我们的建模方法是通用的;大多数数据注释研究可以从我们的方法中受益。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号