首页> 外文会议>European Conference on Information Retrieval >On the Instability of Diminishing Return IR Measures
【24h】

On the Instability of Diminishing Return IR Measures

机译:递减退回红外措施的不稳定性

获取原文

摘要

The diminishing return property of ERR (Expected Reciprocal Rank) is highly intuitive and attractive: its user model says, for example, that after the users have found a highly relevant document at rank r, few of them will continue to examine rank (r + 1) and beyond. Recently, another IR evaluation measure based on diminishing return called iRBU (intentwise Rank-Biased Utility) was proposed, and it was reported that nDCG (normalised Discounted Cumulative Gain) and iRBU align surprisingly well with users' SERP (Search Engine Result Page) preferences. The present study conducts offline evaluations of diminishing return measures including ERR and iRBU along with other popular measures such as nDCG, using four test collections and the associated runs from recent TREC tracks and NTCIR tasks. Our results show that the diminishing return measures generally underperform other graded relevance measures in terms of system ranking consistency across two disjoint topic sets as well as discriminative power. The results generalise a previous finding on ERR regarding its limited discriminative power, showing that the diminishing return user model hurts the stability of evaluation measures regardless of the utility function part of the measure. Hence, while we do recommend iRBU along with nDCG for evaluating adhoc IR systems from multiple user-oriented angles, iRBU should be used under the awareness that it can be much less statistically stable than nDCG.
机译:Err(预期互惠级别)的递减财产递减(预期互惠级别)是非常直观的,有吸引力的:例如,其用户模型说,例如,在用户在等级R中找到高度相关的文档之后,其中很少有人将继续检查等级(R + 1)及以后。最近,提出了一种基于递减返回的IR评估措施,称为IrBu(Intinswise rank-偏见的实用程序),并据报道,NDCG(归一化折扣累积增益)和IRBU对用户的SERP(搜索引擎结果页面)偏好令人惊讶地对齐。本研究开展了递减剩余措施,包括ERR和IRBU的递减措施以及其他流行措施,如NDCG,使用四个测试集合和相关的TREC曲目和NTCIR任务。我们的研究结果表明,递减措施逐渐减少,在系统排名在两个不相交的主题集以及鉴别权力方面,在系统排名的情况下,其它分级相关性措施普遍表现不佳。结果概括了先前关于其有限辨别力的错误的发现,表明递减的回报用户模型损害了评估措施的稳定性,而不管该措施的一部分。因此,虽然我们建议使用IRBU以及NDCG来评估来自多个面向用户的角度的ADHOC IR系统,但IRBU应该在意识下使用,因为它可以在比NDCG统计学稳定的情况下使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号