首页> 外文会议>Annual conference on Neural Information Processing Systems >Explore no more: Improved high-probability regret bounds for non-stochastic bandits
【24h】

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

机译:不再探索:改进了非随机匪徒的高概率后悔界限

获取原文

摘要

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them requires a large deal of technical effort and significant modifications to the standard, more intuitive algorithms that come only with guarantees that hold on expectation. One of these modifications is forcing the learner to sample arms from the uniform distribution at least Q(T2~(1/2)) times over T rounds, which can adversely affect performance if many of the arms are suboptimal. While it is widely conjectured that this property is essential for proving high-probability regret bounds, we show in this paper that it is possible to achieve such strong results without this undesirable exploration component. Our result relies on a simple and intuitive loss-estimation strategy called Implicit exploration (Ⅸ) that allows a remarkably clean analysis. To demonstrate the flexibility of our technique, we derive several improved high-probability bounds for various extensions of the standard multi-armed bandit framework. Finally, we conduct a simple experiment that illustrates the robustness of our implicit exploration technique.
机译:这项工作解决了非随机多武装匪徒问题中使后悔最小化的问题,重点是高可能性地提供性能保证。由于要证明这些结果需要大量的技术努力,并且需要对标准,更直观的算法进行重大修改,而这些算法只能保证符合预期,因此这些结果在文献中相当少见。这些修改之一是迫使学习者在T轮中从均匀分布中抽取至少Q(T2〜(1/2))次来采样手臂,如果许多手臂都不理想,这可能会对性能产生不利影响。尽管人们普遍认为此属性对于证明高概率后悔界限至关重要,但我们在本文中表明,如果没有这种不良的勘探成分,就有可能获得如此强大的结果。我们的结果依赖于称为隐式探索(exploration)的简单直观的损失估算策略,该策略可以进行非常清晰的分析。为了证明我们技术的灵活性,我们针对标准多臂匪盗框架的各种扩展推导了几个改进的高概率边界。最后,我们进行了一个简单的实验,说明了我们的隐式勘探技术的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号