首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons
【24h】

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

机译:有限轮适应性学习:抛硬币,多武装土匪和成对比较中的排名

获取原文
获取外文期刊封面目录资料

摘要

In many learning settings, active/adaptive querying is possible, but the number of rounds of adaptivity is limited. We study the relationship between query complexity and adaptivity in identifying the $k$ most biased coins among a set of $n$ coins with unknown biases. This problem is a common abstraction of many well-studied problems, including the problem of identifying the $k$ best arms in a stochastic multi-armed bandit, and the problem of top-$k$ ranking from pairwise comparisons. An $r$-round adaptive algorithm for the $k$ most biased coins problem specifies in each round the set of coin tosses to be performed based on the observed outcomes in earlier rounds, and outputs the set of $k$ most biased coins at the end of $r$ rounds. When $r=1$, the algorithm is known as em non-adaptive; when $r$ is unbounded, the algorithm is known as em fully adaptive. While the power of adaptivity in reducing query complexity is well known, full adaptivity requires repeated interaction with the coin tossing (feedback generation) mechanism, and is highly sequential, since the set of coins to be tossed in each round can only be determined after we have observed the outcomes of the coin tosses from the previous round. In contrast, algorithms with only few rounds of adaptivity require fewer rounds of interaction with the feedback generation mechanism, and offer the benefits of parallelism in algorithmic decision-making. Motivated by these considerations, we consider the question of how much adaptivity is needed to realize the optimal worst case query complexity for identifying the $k$ most biased coins. Given any positive integer $r$, we derive essentially matching upper and lower bounds on the query complexity of $r$-round algorithms. We then show that $Θ(log^*n)$ rounds are both necessary and sufficient for achieving the optimal worst case query complexity for identifying the $k$ most biased coins. In particular, our algorithm achieves the optimal query complexity in at most $log^*n$ rounds, which implies that on any realistic input, $5$ parallel rounds of exploration suffice to achieve the optimal worst-case sample complexity. The best known algorithm prior to our work required $Θ(log n)$ rounds to achieve the optimal worst case query complexity even for the special case of $k=1$.
机译:在许多学习设置中,可以进行主动/自适应查询,但是适应性的轮数有限。我们研究了查询复杂度和适应性之间的关系,以便在具有未知偏差的一组$ n $硬币中识别出最有偏差的$ k $硬币。这个问题是许多经过充分研究的问题的常见抽象,包括在随机多臂匪徒中确定$ k $最佳武器的问题,以及从成对比较中获得最高$ k $排名的问题。针对$ k $最偏向硬币问题的$ r $一轮自适应算法在每个回合中指定将基于较早回合中观察到的结果进行抛硬币的集合,并在以下位置输出$ k $最偏向硬币的集合: $ r $回合结束。当$ r = 1 $时,该算法称为 em非自适应;当$ r $无界时,该算法称为 em完全自适应。尽管适应性在降低查询复杂度方面的能力众所周知,但是完全适应性要求与抛硬币(反馈生成)机制进行反复交互,并且具有高度顺序性,因为每一轮要抛硬币的集合只能在我们确定之后才能进行。观察了上一轮抛硬币的结果。相比之下,只有几轮适应性的算法需要与反馈生成机制进行更少的交互,并在算法决策中提供了并行性的好处。出于这些考虑,我们考虑了为确定最佳的最差情况查询复杂度以识别$ k $最有偏见的硬币需要多少适应性的问题。给定任何正整数$ r $,我们得出$ r $舍入算法的查询复杂度的本质上匹配的上下限。然后,我们证明$Θ( log ^ * n)$轮次对于实现最佳最坏情况查询复杂度以识别$ k $最有偏见的硬币既必要又足够。特别是,我们的算法最多在$ log ^ * n $轮中获得了最佳查询复杂度,这意味着在任何实际输入上,$ 5 $并行探索轮足以实现最佳的最坏情况下的样本复杂度。即使在$ k = 1 $的特殊情况下,我们工作之前最知名的算法也需要进行$Θ( log n)$轮次才能获得最佳的最坏情况查询复杂度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号