首页> 外文会议>International Symposium on String Processing and Information Retrieval >C/W/L Spells 'Cool': User-Based Evaluation in Information Retrieval
【24h】

C/W/L Spells 'Cool': User-Based Evaluation in Information Retrieval

机译:C / W / L法术“很酷”:信息检索中基于用户的评估

获取原文
获取外文期刊封面目录资料

摘要

The Information Retrieval community pride themselves on the strength of their evaluation protocols: working with large test collections; executing dozens or hundreds of queries taken to be representative of typical information requirements; and, in many cases, employing expert assessors to form relevance judgments. System scores using these resources are then computed using an effectiveness metric such as precision at depth k, expected reciprocal rank, or average precision; and champion-versus-challenger evaluations are carried out by considering the two system means through the lens of a statistical significance test. This presentation focuses on the effectiveness metrics that are at the heart of this batch evaluation pipeline. After describing a range of traditional approaches to measuring effectiveness, the 'C/W/L' framework [2, 3] is motivated and defined, and a range of implications of this approach to 1R evaluation then explored. Notable in the C/W/L structure is the explicit correspondence between metrics and user models. This relationship makes it possible for metrics to be evaluated and compared in terms of their suitability for different types of search task, based on the extent to which the user model associated with each candidate metric correlates with observed user behavior when performing that task [1,4, 5|. Measurement accuracy is also considered for C/W/L metrics, together with the implications that certain types of user behavior then have on experimental design.
机译:信息检索社区以其评估协议的强大实力而自豪:与大型测试集合合作;执行数十个或数百个代表典型信息需求的查询;在许多情况下,聘请专家评估师来做出相关性判断。然后使用有效性度量(例如深度k的精度,预期的倒数排名或平均精度)来计算使用这些资源的系统得分;冠军和挑战者的评估是通过统计显着性检验的角度考虑这两种系统方法来进行的。本演示文稿重点介绍了该批次评估渠道中的有效性指标。在描述了一系列衡量有效性的传统方法之后,“ C / W / L”框架[2,3]得到了激励和定义,然后探索了该方法对1R评估的一系列含义。 C / W / L结构中值得注意的是指标和用户模型之间的显式对应。根据与每个候选指标相关联的用户模型与执行该任务时观察到的用户行为的关联程度,这种关系使得可以根据指标对不同类型搜索任务的适用性来评估和比较指标[1, 4、5 |。还考虑了C / W / L度量的测量精度,以及某些类型的用户行为对实验设计的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号