...
首页> 外文期刊>Journal of machine learning research >On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models
【24h】

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

机译:多臂强盗模型中最佳武器识别的复杂性

获取原文
           

摘要

The stochastic multi-armed bandit model is a simple abstractionthat has proven useful in many different contexts in statisticsand machine learning. Whereas the achievable limit in terms ofregret minimization is now well known, our aim is to contributeto a better understanding of the performance in terms ofidentifying the $m$ best arms. We introduce generic notions ofcomplexity for the two dominant frameworks considered in theliterature: fixed-budget and fixed-confidence settings. In thefixed-confidence setting, we provide the first knowndistribution-dependent lower bound on the complexity thatinvolves information-theoretic quantities and holds when $mgeq1$ under general assumptions. In the specific case of two armed-bandits, we derive refined lower bounds in both the fixed-confidence and fixed-budget settings, along with matchingalgorithms for Gaussian and Bernoulli bandit models. Theseresults show in particular that the complexity of the fixed-budget setting may be smaller than the complexity of the fixed-confidence setting, contradicting the familiar behavior observedwhen testing fully specified alternatives. In addition, we alsoprovide improved sequential stopping rules that have guaranteederror probabilities and shorter average running times. Theproofs rely on two technical results that are of independentinterest : a deviation lemma for self-normalized sums (Lemma 7)and a novel change of measure inequality for bandit models(Lemma 1). color="gray">
机译:随机多武装匪徒模型是一个简单的抽象,已证明在统计和机器学习的许多不同环境中很有用。虽然众所周知,在后悔最小化方面可以达到的极限是众所周知的,但我们的目标是通过识别$ m $最佳武器来更好地理解性能。我们为文学中考虑的两个主要框架引入了复杂性的通用概念:固定预算和固定置信度设置。在固定置信度设置中,我们提供了第一个已知的与分布相关的下界,该下界涉及涉及信息理论量的复杂度,在一般假设下,当$ mgeq1 $成立时,该下界成立。在两个武装匪徒的特定案例中,我们推导了固定置信度和固定预算环境下的精确下界,以及高斯和伯努利匪徒模型的匹配算法。这些结果尤其表明,固定预算设置的复杂性可能小于固定置信设置的复杂性,这与测试完全指定的替代方案时观察到的熟悉行为相矛盾。此外,我们还提供了改进的顺序停止规则,这些规则可确保出现错误的可能性和较短的平均运行时间。证明依赖于两个相互独立的技术结果:自归一化和的偏差引理(引理7)和匪徒模型的度量不等式的新颖变化(引理1)。 color =“ gray”>

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号