首页> 外文会议>Advances in information retrieval. >On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents
【24h】

On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents

机译:关于汇总多个人群工人的标签以推断文档的相关性

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem of acquiring relevance judgements for information retrieval (IR) test collections through crowdsourcing when no true relevance labels are available. We collect multiple, possibly noisy relevance labels per document from workers of unknown labelling accuracy. We use these labels to infer the document relevance based on two methods. The first method is the commonly used majority voting (MV) which determines the document relevance based on the label that received the most votes, treating all the workers equally. The second is a probabilistic model that concurrently estimates the document relevance and the workers accuracy using expectation maximization (EM). We run simulations and conduct experiments with crowdsourced relevance labels from the INEX 2010 Book Search track to investigate the accuracy and robustness of the relevance assessments to the noisy labels. We observe the effect of the derived relevance judgments on the ranking of the search systems. Our experimental results show that the EM method outperforms the MV method in the accuracy of relevance assessments and IR systems ranking. The performance improvements are especially noticeable when the number of labels per document is small and the labels are of varied quality.
机译:当没有真正的相关标签可用时,我们考虑通过众包获取信息检索(IR)测试集合的相关性判断的问题。我们会从未知标签准确性的工作人员那里为每个文档收集多个可能有噪音的相关标签。我们使用这些标签基于两种方法来推断文档的相关性。第一种方法是常用的多数投票(MV),该投票基于获得最多投票的标签确定文档的相关性,从而平等对待所有工人。第二个是概率模型,它使用期望最大化(EM)同时估计文档的相关性和工作人员的准确性。我们对来自INEX 2010图书搜索轨道的众包相关标签进行了模拟并进行了实验,以调查与噪音标签相关性评估的准确性和鲁棒性。我们观察得出的相关性判断对搜索系统排名的影响。我们的实验结果表明,在相关性评估和IR系统排名方面,EM方法优于MV方法。当每个文档的标签数量少且标签质量参差不齐时,性能改进尤其明显。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号