首页> 外文会议>European conference on information retrieval research >Tolerance of Effectiveness Measures to Relevance Judging Errors
【24h】

Tolerance of Effectiveness Measures to Relevance Judging Errors

机译:有效性措施对相关性判断错误的容忍度

获取原文

摘要

Crowdsourcing relevance judgments for test collection construction is attractive because the practice has the possibility of being more affordable than hiring high quality assessors. A problem faced by all crowdsourced judgments - even judgments formed from the consensus of multiple workers - is that there will be differences in the judgments compared to the judgments produced by high quality assessors. For two TREC test collections, we simulated errors in sets of judgments and then measured the effect of these errors on effectiveness measures. We found that some measures appear to be more tolerant of errors than others. We also found that to achieve high rank correlation in the ranking of retrieval systems requires conservative judgments for average precision (AP) and nDCG, while precision at rank 10 requires neutral judging behavior. Conservative judging avoids mistakenly judging non-relevant documents as relevant at the cost of judging some relevant documents as non-relevant. In addition, we found that while conservative judging behavior maximizes rank correlation for AP and nDCG, to minimize the error in the measures' values requires more liberal behavior. Depending on the nature of a set of crowdsourced judgments, the judgments may be more suitable with some effectiveness measures than others, and the use of some effectiveness measures will require higher levels of judgment oualitv than others.
机译:与测试集构建相关的众包相关判断很有吸引力,因为这种做法比雇用高素质的评估师更有可能负担得起。所有众包判断(甚至是由多个工人达成共识所形成的判断)所面临的问题是,与高素质评估人员做出的判断相比,判断中会存在差异。对于两个TREC测试集合,我们模拟了多组判断中的错误,然后测量了这些错误对有效性度量的影响。我们发现某些措施似乎比其他措施更能容忍错误。我们还发现,要在检索系统的排名中实现较高的排名相关性,需要对平均精度(AP)和nDCG进行保守的判断,而排名10的精度则需要中立的判断行为。保守的判断避免了将无关的文件错误地判断为相关文件,而以判断某些相关文件为不相关文件为代价。此外,我们发现,尽管保守的评判行为可以最大程度地提高AP和nDCG的排名相关性,但要最小化度量值的误差则需要更自由的行为。取决于一组众包判断的性质,这些判断可能比某些有效度量更适合于某些判断,并且使用某些有效性度量比其他判断需要更高的判断水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号