Risk-Sensitive Evaluation and Learning to Rankrnusing Multiple Baselines

B. Taner Dinçer; Craig Macdonald; Iadh Ounis

首页> 外文期刊>ACM SIGIR FORUM >Risk-Sensitive Evaluation and Learning to Rankrnusing Multiple Baselines

【24h】

Risk-Sensitive Evaluation and Learning to Rankrnusing Multiple Baselines

机译：风险敏感性评估和学习使用多个基准进行排名

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A robust retrieval system ensures that user experience isrnnot damaged by the presence of poorly-performing queries.rnSuch robustness can be measured by risk-sensitive evaluationrnmeasures, which assess the extent to which a systemrnperforms worse than a given baseline system. However, usingrna particular, single system as the baseline suu000bers fromrnthe fact that retrieval performance highly varies among IRrnsystems across topics. Thus, a single system would in generalrnfail in providing enough information about the real baselinernperformance for every topic under consideration, andrnhence it would in general fail in measuring the real risk associatedrnwith any given system. Based upon the Chi-squaredrnstatistic, we propose a new measure ZRisk that exhibits morernpromise since it takes into account multiple baselines whenrnmeasuring risk, and a derivative measure called GeoRisk,rnwhich enhances ZRisk by also taking into account the overallrnmagnitude of eu000bectiveness. This paper demonstratesrnthe benefts of ZRisk and GeoRisk upon TREC data, andrnhow to exploit GeoRisk for risk-sensitive learning to rank,rnthereby making use of multiple baselines within the learningrnobjective function to obtain eu000bective yet risk-averse/robustrnranking systems. Experiments using 10,000 topics from thernMSLR learning to rank dataset demonstrate the eu000ecacy ofrnthe proposed Chi-square statistic-based objective function.

机译：健壮的检索系统可确保不会因出现性能不佳的查询而损坏用户体验。可以通过风险敏感的评估措施来衡量这种健壮性，这些措施可评估系统的性能比给定的基准系统更差。但是，使用特定的单个系统作为基准源，是因为跨主题的IR系统之间的检索性能差异很大。因此，单个系统通常无法为所考虑的每个主题提供足够的有关实际基准性能的信息，因此通常无法测量与任何给定系统相关的实际风险。基于卡方统计量，我们提出了一种新的度量ZRisk，它表现出更多的承诺，因为它在测量风险时考虑了多个基线，而一种派生的度量称为GeoRisk，它通过考虑eu000bectiveness的整体幅度来增强ZRisk。本文演示了基于TREC数据的ZRisk和GeoRisk的好处，以及如何利用GeoRisk进行风险敏感型学习排名，从而在学习目标函数内利用多个基准来获得有效的而又能规避风险/稳健的排名系统。使用来自MSLR学习的10,000个主题对数据集进行排名的实验证明了所提出的基于卡方统计量的目标函数的有效性。

著录项

来源
《ACM SIGIR FORUM 》 |2016年第21期| 483-492| 共10页
作者
B. Taner Dinçer; Craig Macdonald; Iadh Ounis;
展开▼
作者单位

Sitki Kocman University of Mugla, Mugla, Turkey;

University of Glasgow, Glasgow, UK;

University of Glasgow, Glasgow, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Baseline evaluation in youth ice hockey players: comparing methods for documenting prior concussions and attention or learning disorders. [J] . Carly D McKay, Kathryn J Schneider, Brian L Brooks, The journal of orthopaedic and sports physical therapy . 2014 ,第5期

机译：青年冰球运动员的基线评估：比较记录先前脑震荡和注意力或学习障碍的方法。
2. A brief behavioral treatment for unresolved insomnia in adolescents: a single-case multiple baseline pilot study, evaluating self-reported outcomes of efficacy, safety, and acceptability [J] . Quartly-Scott Gregory I, Miller Christopher B., Hawes David J. Journal of clinical sleep medicine: JCSM : official publication of the American Academy of Sleep Medicine . 2020 ,第1期

机译：在青少年未解决失眠的简要行为治疗：一个单一的多个基线试验研究，评估自我报告的疗效结果，安全性和可接受性
3. A Multiple-Baseline Evaluation of Acceptance and Commitment Therapy Focused on Repetitive Negative Thinking for Comorbid Generalized Anxiety Disorder and Depression [J] . Francisco J. Ruiz, Carmen Luciano, Cindy L. Flórez, Frontiers in Psychology . 2020 ,第a期

机译：对接受和承诺治疗的多基线评估，重点是复古的负面思考，对合并焦虑症和抑郁症
4. Tackling Biased Baselines in the Risk-Sensitive Evaluation of Retrieval Systems [C] . B. Taner Dincer, Iadh Ounis, Craig Macdonald European conference on information retrieval research . 2014

机译：解决检索系统的风险敏感性评估中的偏见基准
5. A single-subject multiple baseline and feminist intertextual deconstruction of gender differences among kindergartners in learning the alphabet using clay and a tactual/kinesthetic multiple intelligence and Montessori pedagogy. [D] . Centofanti, Joyce Michelina. 2002

机译：幼儿园学生在使用粘土和触觉/动觉多元智能以及蒙特梭利教学法学习字母时，对性别差异的单主题多基线和女权主义跨文本解构。
6. Evaluation of survival extrapolation in immuno-oncology using multiple pre-planned data cuts: learnings to aid in model selection [O] . Ash Bullement, Anna Willis, Amerah Amin, 2020

机译：使用多个预先计划的数据削减来评估免疫肿瘤学中的生存外推法：有助于模型选择的知识
7. Risk-Sensitive Evaluation and Learning to Rank using Multiple Baselines [O] . Dinçer, B. Taner, Macdonald, Craig, Ounis, Iadh 2016

机译：风险敏感度评估和学习使用多个基线进行排名

Risk-Sensitive Evaluation and Learning to Rankrnusing Multiple Baselines

摘要

著录项

相似文献

相关主题

期刊订阅