首页> 外文会议>European Conference on Information Retrieval Research >Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems
【24h】

Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems

机译:解决检索系统风险敏感评估中的偏置基线

获取原文

摘要

The aim of optimising information retrieval (IR) systems using a risk-sensitive evaluation methodology is to minimise the risk of performing any particular topic less effectively than a given baseline system. Baseline systems in this context determine the reference effectiveness for topics, relative to which the effectiveness of a given IR system in minimising the risk will be measured. However, the comparative risk-sensitive evaluation of a set of diverse IR systems - as attempted by the TREC 2013 Web track - is challenging, as the different systems under evaluation may be based upon a variety of different (base) retrieval models, such as learning to rank or language models. Hence, a question arises about how to properly measure the risk exhibited by each system. In this paper, we argue that no model of information retrieval alone is representative enough in this respect to be a true reference for the models available in the current state-of-the-art, and demonstrate, using the TREC 2012 Web track data, that as the baseline system changes, the resulting risk-based ranking of the systems changes significantly. Instead of using a particular system's effectiveness as the reference effectiveness for topics, we propose several remedies including the use of mean within-topic system effectiveness as a baseline, which is shown to enable unbiased measurements of the risk-sensitive effectiveness of IR systems.
机译:使用风险敏感评估方法优化信息检索(IR)系统的目的是最小化比给定基线系统更有效地执行任何特定主题的风险。在此上下文中基线系统确定了主题的参考效果,相对于该题目的参考效能,相对于该局部的有效性在最小化风险中的情况下,将被测量。然而,这是一组不同的IR系统的比较风险敏感评估 - 如TREC 2013 Web轨道 - 正在挑战,因为评估下的不同系统可以基于各种不同(基本)的检索模型,例如学习排名或语言模型。因此,出现了如何正确衡量每个系统展示的风险的问题。在本文中,我们认为,单独的信息模型是代表足够的代表,这是对当前最先进的模型的真实参考,并使用TREC 2012 Web跟踪数据演示,即随着基线系统的变化,导致的基于风险的系统排名显着变化。我们提出了几种补救措施而不是使用特定的系统作为主题的参考效果,而不是使用题目在主题内部系统效果中作为基线的疗效,这被证明可以实现IR系统的风险敏感效果的无偏见测量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号