首页> 外文会议>AAAI Fall Symposium on Semantics for Big Data >Measuring Crowd Truth for Medical Relation Extraction
【24h】

Measuring Crowd Truth for Medical Relation Extraction

机译:测量医学关系提取的人群真理

获取原文

摘要

One of the critical steps in analytics for big data is creating a human annotated ground truth. Crowdsourcing has proven to be a scalable and cost-effective approach to gathering ground truth data, but most annotation tasks are based on the assumption that for each annotated instance there is a single right answer. From this assumption it has always followed that ground truth quality can be measured in inter-annotator agreement, and unfortunately crowdsourcing typically results in high disagreement. We have been working on a different assumption, that disagreement is not noise but signal, and that in fact crowdsourcing can not only be cheaper and scalable, it can be higher quality. In this paper we present a framework for continuously gathering, analyzing and understanding large amounts of gold standard annotation disagreement data. We discuss the experimental results demonstrating that there is useful information in human disagreement on annotation tasks. Our results show .98 accuracy in detecting low quality crowdsource workers, and 87 F-measure at recognizing useful sentences for training relation extraction systems.
机译:大数据分析中的一个关键步骤是创建人类的录音原理。众群已被证明是收集地面真理数据的可扩展性和经济高效的方法,但大多数注释任务都是基于每个注释实例的假设有一个正确答案。从这个假设,它一直遵循地面真理质量可以在互联网间协议中衡量,不幸的是,众群通常会导致高度分歧。我们一直在研究不同的假设,这种分歧不是噪音,而是信号,其它众包不仅可以更便宜和可扩展,它可以更高的质量。在本文中,我们提出了一个持续收集,分析和理解大量黄金标准注释分歧数据的框架。我们讨论了实验结果,表明在对辅助任务上有一个有用的信息。我们的结果显示.98检测低品质人群工人的准确性,87 F测量识别有用的判决训练关系提取系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号