【24h】

Efficient Rank Aggregation Using Partial Data

机译:使用部分数据有效等级聚合

获取原文

摘要

The need to rank items based on user input arises in many practical applications such as elections, group decision making and recommendation systems. The primary challenge in such scenarios is to decide on a global ranking based on partial preferences provided by users. The standard approach to address this challenge is to ask users to provide explicit-numerical ratings (cardinal information) of a subset of the items. The main appeal of such an approach is the ease of aggregation. However, the rating scale as well as the individual ratings are often arbitrary and may not be consistent from one user to another. A more natural alternative to numerical ratings requires users to compare pairs of items (ordinal information). On the one hand, such comparisons provide an "absolute" indicator of the user's preference. On the other hand, it is often hard to combine or aggregate these comparisons to obtain a consistent global ranking. In this work, we provide a tractable framework for utilizing comparison data as well as first-order marginal information (see Section 2) for the purpose of ranking. We treat the available information as partial samples from an unknown distribution over permutations. We then reduce ranking problems of interest to performing inference on this distribution. Specifically, we consider the problems of (a) finding an aggregate ranking of n items, (b) learning the mode of the distribution, and (c) identifying the top k items. For many of these problems, we provide efficient algorithms to infer the ranking directly from the data without the need to estimate the underlying distribution. In other cases, we use the Principle of Maximum Entropy to devise a concise parameterization of a distribution consistent with observations using only O(n~2) parameters, where n is the number of items in question. We propose a distributed, iterative algorithm for estimating the parameters of the distribution. We establish the correctness of the algorithm and identify its rate of convergence explicitly.
机译:在许多实际应用中,需要基于用户输入进行排序的需求,例如选举,组决策和推荐系统。这种情况下的主要挑战是根据用户提供的部分偏好来决定全局排名。解决这一挑战的标准方法是要求用户提供项目子集的显式数值额定值(基本信息)。这种方法的主要吸引力是易于聚合。然而,评级规模以及各个额定值通常是任意的,并且可能不会与一个用户保持一致。对数值额定值的更自然的替代方案需要用户比较成对的物品(序号)。一方面,这种比较提供了用户偏好的“绝对”指标。另一方面,通常很难结合或汇总这些比较以获得一致的全球排名。在这项工作中,我们提供了一种用于利用比较数据以及一阶边际信息(参见第2节)以进行排名的易操作框架。我们将可用信息视为来自未知分布在排列的部分样本中。然后,我们减少对这个分布表现推断的感兴趣的排名问题。具体地,我们考虑(a)找到n个项目的总排名的问题,(b)学习分发模式,(c)识别顶部k项。对于许多这些问题,我们提供高效的算法,直接从数据推断排名,而无需估计底层分布。在其他情况下,我们使用最大熵原理来设计一致的分布的简洁参数化,该分布一致地使用O(n〜2)参数,其中n是所讨论的项目数。我们提出了一种用于估计分布参数的分布式迭代算法。我们建立了算法的正确性,并明确识别其收敛速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号