首页> 外文会议>International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics >Cross-replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability
【24h】

Cross-replication Reliability - An Empirical Approach to Interpreting Inter-rater Reliability

机译:交叉复制可靠性 - 解释帧间间可靠性的实证方法

获取原文

摘要

When collecting annotations and labeled data from humans, a standard practice is to use inter-rater reliability (IRR) as a measure of data goodness (Hallgren, 2012). Metrics such as Krippendorff's alpha or Cohen's kappa are typically required to be above a threshold of 0.6 (Landis and Koch, 1977). These absolute thresholds arc unreasonable for crowd-sourced data from annotators with high cultural and training variances, especially on subjective topics. We present a new alternative to interpreting IRR that is more empirical and contextualized. It is based upon benchmarking IRR against baseline measures in a replication, one of which is a novel cross-replication reliability (xRR) measure based on Cohen's (1960) kappa. We call this approach the xRR framework. We opensouree a replication dataset of 4 million human judgements of facial expressions and analyze it with the proposed framework. We argue this framework can be used to measure the quality of crowdsourced dalasets.
机译:在收集诠释和来自人类的标记数据时,标准做法是使用帧间间可靠性(IRR)作为数据良好的衡量标准(Hallgren,2012)。 克利普坦夫的Alpha或Cohen的Kappa等度量通常需要高于0.6(Landis和Koch,1977)的阈值。 这些绝对阈值对于来自具有高文化和培训差异的注释器的人群资源,特别是在主观主题上,这绝对阈值不合理。 我们提出了一个更为经验和情境化的IRR的新选择。 它是基于在复制中反对基线测量的基于基准测试,其中一个是基于Cohen(1960)Kappa的新型交叉复制可靠性(XRR)测量。 我们称之为XRR框架。 我们OpenSouree为面部表情400万人类判断的复制数据集,并与拟议的框架分析。 我们争辩框架可用于衡量众包的Dalasets的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号