首页> 外文会议>ACM conference on information and knowledge management >Learning to Rank Relevant and Novel Documents throuqh User Feedback
【24h】

Learning to Rank Relevant and Novel Documents throuqh User Feedback

机译:学习对相关和新颖的文件进行排名,推出用户反馈

获取原文

摘要

We consider the problem of learning to rank relevant and novel documents so as to directly maximize a performance metric called Expected Global Utility (EGU), which has several desirable properties: (i) It measures retrieval performance in terms of relevant as well as novel information, (ii) gives more importance to top ranks to reflect common browsing behavior of users, as opposed to existing objective functions based on set-coverage, (iii) accommodates different levels of tolerance towards redundancy, which is not taken into account by existing evaluation measures, and (iv) extends naturally to the evaluation of session-based retrieval comprising multiple ranked lists. Our ground truth is defined in terms of "information nuggets", which are obviously not known to the retrieval system when processing a new user query. Therefore, our approach uses observable query and document features (words and named entities) as surrogates for nuggets, whose weights are learned based on user feedback in an iterative search session. The ranked list is produced to maximize the weighted coverage of these surrogate nuggets. The optimization of such coverage-based metrics is known to be NP-hard. Therefore, we use a greedy algorithm and show that it guarantees good performance due to the submodularity of the objective function. Our experiments on Topic Detection and Tracking data show that the proposed approach represents an efficient and effective retrieval strategy for maximizing EGU, as compared to a purely-relevance based ranking approach that uses Indri, as well as a MMR-based approach for non-redundant ranking.
机译:我们考虑学习的问题,以便等级和新颖文件等级,以便直接最大化称为预期的全球实用程序(egU)的性能度量,其具有若干理想的属性:(i)它在相关的和新颖的信息方面测量检索性能(ii)向顶级排名更重视以反映用户的常见浏览行为,而不是基于设定覆盖的现有客观函数,(iii)对冗余的不同宽度,这是通过现有评估的情况下不考虑的措施和(iv)自然地扩展到评估基于会话的检索,包括多个排名列表。我们的基本真理是根据“信息掘金”而定义的,在处理新用户查询时显然未知为检索系统。因此,我们的方法使用可观察的查询和文档功能(单词和命名实体)作为掘金的代理,其权重基于在迭代搜索会话中的用户反馈来学习。排名列表是制作的,以最大限度地提高这些代理掘金掘金的加权覆盖范围。已知这种基于覆盖的度量的优化是NP-HARD。因此,我们使用贪婪的算法并表明它由于目标函数的子骨折而保证了良好的性能。我们对主题检测和跟踪数据的实验表明,与使用Indri的纯相关性的排名方法相比,该方法的实验表明了最大化EGU的有效和有效的检索策略,以及基于MMR的非冗余方法排行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号