Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

Arpit Agarwal; Shivani Agarwal; Sepehr Assadi; Sanjeev Khanna

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

【24h】

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

机译：有限轮适应性学习：抛硬币，多武装土匪和成对比较中的排名

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In many learning settings, active/adaptive querying is possible, but the number of rounds of adaptivity is limited. We study the relationship between query complexity and adaptivity in identifying the $k$ most biased coins among a set of $n$ coins with unknown biases. This problem is a common abstraction of many well-studied problems, including the problem of identifying the $k$ best arms in a stochastic multi-armed bandit, and the problem of top-$k$ ranking from pairwise comparisons. An $r$-round adaptive algorithm for the $k$ most biased coins problem specifies in each round the set of coin tosses to be performed based on the observed outcomes in earlier rounds, and outputs the set of $k$ most biased coins at the end of $r$ rounds. When $r=1$, the algorithm is known as em non-adaptive; when $r$ is unbounded, the algorithm is known as em fully adaptive. While the power of adaptivity in reducing query complexity is well known, full adaptivity requires repeated interaction with the coin tossing (feedback generation) mechanism, and is highly sequential, since the set of coins to be tossed in each round can only be determined after we have observed the outcomes of the coin tosses from the previous round. In contrast, algorithms with only few rounds of adaptivity require fewer rounds of interaction with the feedback generation mechanism, and offer the benefits of parallelism in algorithmic decision-making. Motivated by these considerations, we consider the question of how much adaptivity is needed to realize the optimal worst case query complexity for identifying the $k$ most biased coins. Given any positive integer $r$, we derive essentially matching upper and lower bounds on the query complexity of $r$-round algorithms. We then show that $Θ(log^*n)$ rounds are both necessary and sufficient for achieving the optimal worst case query complexity for identifying the $k$ most biased coins. In particular, our algorithm achieves the optimal query complexity in at most $log^*n$ rounds, which implies that on any realistic input, $5$ parallel rounds of exploration suffice to achieve the optimal worst-case sample complexity. The best known algorithm prior to our work required $Θ(log n)$ rounds to achieve the optimal worst case query complexity even for the special case of $k=1$.

机译：在许多学习设置中，可以进行主动/自适应查询，但是适应性的轮数有限。我们研究了查询复杂度和适应性之间的关系，以便在具有未知偏差的一组$ n $硬币中识别出最有偏差的$ k $硬币。这个问题是许多经过充分研究的问题的常见抽象，包括在随机多臂匪徒中确定$ k $最佳武器的问题，以及从成对比较中获得最高$ k $排名的问题。针对$ k $最偏向硬币问题的$ r $一轮自适应算法在每个回合中指定将基于较早回合中观察到的结果进行抛硬币的集合，并在以下位置输出$ k $最偏向硬币的集合： $ r $回合结束。当$ r = 1 $时，该算法称为 em非自适应;当$ r $无界时，该算法称为 em完全自适应。尽管适应性在降低查询复杂度方面的能力众所周知，但是完全适应性要求与抛硬币（反馈生成）机制进行反复交互，并且具有高度顺序性，因为每一轮要抛硬币的集合只能在我们确定之后才能进行。观察了上一轮抛硬币的结果。相比之下，只有几轮适应性的算法需要与反馈生成机制进行更少的交互，并在算法决策中提供了并行性的好处。出于这些考虑，我们考虑了为确定最佳的最差情况查询复杂度以识别$ k $最有偏见的硬币需要多少适应性的问题。给定任何正整数$ r $，我们得出$ r $舍入算法的查询复杂度的本质上匹配的上下限。然后，我们证明$Θ（ log ^ * n）$轮次对于实现最佳最坏情况查询复杂度以识别$ k $最有偏见的硬币既必要又足够。特别是，我们的算法最多在$ log ^ * n $轮中获得了最佳查询复杂度，这意味着在任何实际输入上，$ 5 $并行探索轮足以实现最佳的最坏情况下的样本复杂度。即使在$ k = 1 $的特殊情况下，我们工作之前最知名的算法也需要进行$Θ（ log n）$轮次才能获得最佳的最坏情况查询复杂度。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2017年第1期|共37页
作者
Arpit Agarwal; Shivani Agarwal; Sepehr Assadi; Sanjeev Khanna;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. AdaptiveBandit: A Multi-armed Bandit Framework for Adaptive Sampling in Molecular Simulations [J] . Adrià Pérez, Pablo Herrera-Nieto, Stefan Doerr, Journal of chemical theory and computation: JCTC . 2020,第7期

机译：AdaptiveBandit：用于分子模拟中的自适应采样的多武装强盗框架
2. Statistical Consequences of using Multi-armed Bandits to Conduct Adaptive Educational Experiments [J] . Anna Rafferty, Huiji Ying, Joseph Williams Journal of Educational Data Mining . 2019,第1期

机译：使用多臂土匪进行适应性教育实验的统计结果
3. Bayesian adaptive bandit-based designs using the Gittins index for multi-armed trials with normally distributed endpoints [J] . Smith Adam L., Villar Sofia S. Journal of applied statistics . 2018,第5a8期

机译：使用Gittins索引的基于贝叶斯自适应强盗的设计，用于具有正态分布端点的多臂试验
4. Learning diverse rankings with multi-armed bandits [C] . Filip Radlinski, Robert Kleinberg, Thorsten Joachims International conference on Machine learning . 2008

机译：与多臂土匪学习不同的排名
5. Multi-Armed Bandits for Preference Learning [D] . Katariya, Sumeet. 2018

机译：偏好学习的多武装土匪
6. Bayesian adaptive bandit-based designs using the Gittins index for multi-armed trials with normally distributed endpoints [O] . Adam L. Smith, Sofía S. Villar -1

机译：使用Gittins索引的基于贝叶斯自适应强盗的设计用于具有正态分布端点的多臂试验
7. Exploration with limited memory: streaming algorithms for coin tossing, noisy comparisons, and multi-armed bandits [O] . Sepehr Assadi, Chen Wang 2020

机译：有限的记忆探索：用于硬币折腾，嘈杂比较和多武装匪徒的流媒体算法
8. Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit [R] . Liu, H., Liu, K., Zhao, Q. 2010

机译：在变化的世界中学习：非贝叶斯不安定的多武装强盗

Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅