On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

Emilie Kaufmann; Olivier Capp??; Aur??lien Garivier

首页> 外文期刊>Journal of machine learning research >On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

【24h】

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

机译：多臂强盗模型中最佳武器识别的复杂性

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The stochastic multi-armed bandit model is a simple abstractionthat has proven useful in many different contexts in statisticsand machine learning. Whereas the achievable limit in terms ofregret minimization is now well known, our aim is to contributeto a better understanding of the performance in terms ofidentifying the $m$ best arms. We introduce generic notions ofcomplexity for the two dominant frameworks considered in theliterature: fixed-budget and fixed-confidence settings. In thefixed-confidence setting, we provide the first knowndistribution-dependent lower bound on the complexity thatinvolves information-theoretic quantities and holds when $mgeq1$ under general assumptions. In the specific case of two armed-bandits, we derive refined lower bounds in both the fixed-confidence and fixed-budget settings, along with matchingalgorithms for Gaussian and Bernoulli bandit models. Theseresults show in particular that the complexity of the fixed-budget setting may be smaller than the complexity of the fixed-confidence setting, contradicting the familiar behavior observedwhen testing fully specified alternatives. In addition, we alsoprovide improved sequential stopping rules that have guaranteederror probabilities and shorter average running times. Theproofs rely on two technical results that are of independentinterest : a deviation lemma for self-normalized sums (Lemma 7)and a novel change of measure inequality for bandit models(Lemma 1). color="gray">

机译：随机多武装匪徒模型是一个简单的抽象，已证明在统计和机器学习的许多不同环境中很有用。虽然众所周知，在后悔最小化方面可以达到的极限是众所周知的，但我们的目标是通过识别$ m $最佳武器来更好地理解性能。我们为文学中考虑的两个主要框架引入了复杂性的通用概念：固定预算和固定置信度设置。在固定置信度设置中，我们提供了第一个已知的与分布相关的下界，该下界涉及涉及信息理论量的复杂度，在一般假设下，当$ mgeq1 $成立时，该下界成立。在两个武装匪徒的特定案例中，我们推导了固定置信度和固定预算环境下的精确下界，以及高斯和伯努利匪徒模型的匹配算法。这些结果尤其表明，固定预算设置的复杂性可能小于固定置信设置的复杂性，这与测试完全指定的替代方案时观察到的熟悉行为相矛盾。此外，我们还提供了改进的顺序停止规则，这些规则可确保出现错误的可能性和较短的平均运行时间。证明依赖于两个相互独立的技术结果：自归一化和的偏差引理（引理7）和匪徒模型的度量不等式的新颖变化（引理1）。 color =“ gray”>

著录项

来源
《Journal of machine learning research》 |2016年第1期|共42页
作者
Emilie Kaufmann; Olivier Capp??; Aur??lien Garivier;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem [J] . Mannor Shie, Tsitsiklis John N. Journal of machine learning research . 2004,第Jun期

机译：多武装强盗问题中探索的样本复杂性
2. Best arm identification in multi-armed bandits with delayed feedback [J] . Aditya Grover, Todor Markov, Peter Attia, JMLR: Workshop and Conference Proceedings . 2018,第3期

机译：具有延迟反馈的多臂匪的最佳臂识别
3. ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS [J] . Edwards James, Fearnhead Paul, Glazebrook Kevin Probability in the Engineering and Informational Sciences . 2017,第2期

机译：多武装匪徒知识梯度策略中的弱点识别与缓解
4. ε-Best-Arm Identification in Pay-Per-Reward Multi-Armed Bandits [C] . Sivan Sabato Conference on Neural Information Processing Systems . 2020

机译：ε-最佳武装识别在每次付费多武装匪徒
5. Essays on sequential analysis: Multi-armed bandit with availability constraints and sequential change detection and identification. [D] . Yamazaki, Kazutoshi. 2009

机译：关于顺序分析的文章：具有可用性约束以及顺序更改检测和识别的多臂匪。
6. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges [O] . Sofía S. Villar, Jack Bowden, James Wason -1

机译：用于临床试验优化设计的多臂Bandit模型：好处和挑战
7. On Sequential Elimination Algorithms for Best-Arm Identification in Multi-Armed Bandits [O] . Shahrampour, Shahin, Noshad, Mohammad, Tarokh, Vahid 2017

机译：关于最佳臂识别的序贯消除算法多武装强盗

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

摘要

著录项

相似文献

相关主题

期刊订阅