Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable algorithms that explicitly takes document similarity and ranking context into account. Our formulation is a non-trivial common generalization of two multi-armed bandit models from the literature: ranked bandits (Radlinski et al., 2008) and Lipschitz bandits (Kleinberg et al., 2008b). We present theoretical justifications for this approach, as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches. color="gray">
展开▼
机译:大多数学习排名研究都假设不同文档的效用是独立的,这导致学习的排名函数返回多余的结果。避免这种情况的几种方法缺乏令人满意的理论基础,或者无法扩展。我们提供了一种学习排名算法,可以优化满意用户的比例,并提供几种可扩展算法,这些算法明确考虑了文档相似性和排名上下文。我们的公式是文献中两个多臂土匪模型的非平凡的通用概括:等级土匪 i>(Radlinski等,2008)和 Lipschitz土匪 i>(Kleinberg等)等,2008b)。我们提出了这种方法的理论依据,以及一种近乎最佳的算法。我们的评估增加了可提高经验性能的优化,并且表明我们的算法比以前的方法学习速度快了几个数量级。 color =“ gray”>
展开▼