【24h】

Approximation Algorithms for Diversified Search Ranking

机译:多样化搜索排名的近似算法

获取原文

摘要

A fundamental issue in Web search is ranking search results based on user logs, since different users may have different preferences and intents with regards to a search query. Also, in many search query applications, users tend to look at only the top part of the ranked result list in order to find relevant documents. The setting we consider contains various types of users, each of which is interested in a subset of the search results. The goal is to rank the search results of a query providing highly ranked relevant results. Our performance measure is the discounted cumulative gain which offers a graded relevance scale of documents in a search engine result set, and measures the usefulness (gain) of a document based on its position in the result list. Based on this measure, we suggest a general approach to developing approximation algorithms for ranking search results that captures different aspects of users' intents. We also take into account that the relevance of one document cannot be treated independently of the relevance of other documents in a collection returned by a search engine. We first consider the scenario where users are interested in only a single search result (e.g., navigational queries). We then develop a polynomial time approximation scheme for this case. We further consider the general case where users have different requirements on the number of search results, and develop efficient approximation algorithms. Finally, we consider the problem of choosing the top k out of n search results and show that for this problem (1 — 1/e) is indeed the best approximation factor achievable, thus separating the approximability of the two versions of the problem.
机译:Web搜索中的一个基本问题是基于用户日志对搜索结果进行排名,因为不同的用户可能对搜索查询具有不同的偏好和意图。而且,在许多搜索查询应用程序中,用户倾向于仅查看排名结果列表的顶部,以查找相关文档。我们考虑的设置包含各种类型的用户,每个用户都对搜索结果的子集感兴趣。目标是对提供高度相关结果的查询的搜索结果进行排名。我们的绩效指标是折算后的累计收益,它为搜索引擎结果集中的文档提供了分级的相关性等级,并根据其在结果列表中的位置来衡量文档的有用性(收益)。基于此度量,我们建议一种通用方法来开发近似算法,以对搜索结果进行排名,以捕获用户意图的不同方面。我们还考虑到,一个文档的相关性不能独立于搜索引擎返回的集合中其他文档的相关性来对待。我们首先考虑用户只对单个搜索结果(例如导航查询)感兴趣的情况。然后,我们针对这种情况开发了多项式时间近似方案。我们进一步考虑用户对搜索结果数量有不同要求的一般情况,并开发有效的近似算法。最后,我们考虑了从n个搜索结果中选择前k个的问题,并表明对于该问题(1/1 / e)确实是可实现的最佳逼近因子,从而将问题的两个版本分开。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号