...
首页> 外文期刊>ACM SIGIR FORUM >Risk-Sensitive Evaluation and Learning to Rankrnusing Multiple Baselines
【24h】

Risk-Sensitive Evaluation and Learning to Rankrnusing Multiple Baselines

机译:风险敏感性评估和学习使用多个基准进行排名

获取原文
获取原文并翻译 | 示例

摘要

A robust retrieval system ensures that user experience isrnnot damaged by the presence of poorly-performing queries.rnSuch robustness can be measured by risk-sensitive evaluationrnmeasures, which assess the extent to which a systemrnperforms worse than a given baseline system. However, usingrna particular, single system as the baseline suu000bers fromrnthe fact that retrieval performance highly varies among IRrnsystems across topics. Thus, a single system would in generalrnfail in providing enough information about the real baselinernperformance for every topic under consideration, andrnhence it would in general fail in measuring the real risk associatedrnwith any given system. Based upon the Chi-squaredrnstatistic, we propose a new measure ZRisk that exhibits morernpromise since it takes into account multiple baselines whenrnmeasuring risk, and a derivative measure called GeoRisk,rnwhich enhances ZRisk by also taking into account the overallrnmagnitude of eu000bectiveness. This paper demonstratesrnthe benefts of ZRisk and GeoRisk upon TREC data, andrnhow to exploit GeoRisk for risk-sensitive learning to rank,rnthereby making use of multiple baselines within the learningrnobjective function to obtain eu000bective yet risk-averse/robustrnranking systems. Experiments using 10,000 topics from thernMSLR learning to rank dataset demonstrate the eu000ecacy ofrnthe proposed Chi-square statistic-based objective function.
机译:健壮的检索系统可确保不会因出现性能不佳的查询而损坏用户体验。可以通过风险敏感的评估措施来衡量这种健壮性,这些措施可评估系统的性能比给定的基准系统更差。但是,使用特定的单个系统作为基准源,是因为跨主题的IR系统之间的检索性能差异很大。因此,单个系统通常无法为所考虑的每个主题提供足够的有关实际基准性能的信息,因此通常无法测量与任何给定系统相关的实际风险。基于卡方统计量,我们提出了一种新的度量ZRisk,它表现出更多的承诺,因为它在测量风险时考虑了多个基线,而一种派生的度量称为GeoRisk,它通过考虑eu000bectiveness的整体幅度来增强ZRisk。本文演示了基于TREC数据的ZRisk和GeoRisk的好处,以及如何利用GeoRisk进行风险敏感型学习排名,从而在学习目标函数内利用多个基准来获得有效的而又能规避风险/稳健的排名系统。使用来自MSLR学习的10,000个主题对数据集进行排名的实验证明了所提出的基于卡方统计量的目标函数的有效性。

著录项

  • 来源
    《ACM SIGIR FORUM 》 |2016年第21期| 483-492| 共10页
  • 作者单位

    Sitki Kocman University of Mugla, Mugla, Turkey;

    University of Glasgow, Glasgow, UK;

    University of Glasgow, Glasgow, UK;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号