Improved Regret Bounds for Bandit Combinatorial Optimization

机译：强化组合优化的改进遗憾界限

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Bandit combinatorial optimization is a bandit framework in which a player chooses an action within a given finite set A {is contained in} {0, 1}~d and incurs a loss that is the inner product of the chosen action and an unobservable loss vector in R~d in each round. In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. [8] obtained a lower bound Ω((dk~3T/log T)~(1/2)) of the regret, where k is the maximum l_1-norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by Ω((dk~3T)~(1/2)) through applying a factor of (log T)~(1/2), which can be done by means of strongly-correlated losses with binary values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit. In particular, the bound obtained for the bandit ranking in the present study addresses an open problem raised in [8]. In addition, we demonstrate that the problem becomes easier without considering correlations among entries of loss vectors. In fact, if each entry of loss vectors is an independent random variable, then, one can achieve a regret of O((dk~2T)~(1/2)), which is k~(1/2) times smaller than the lower bound shown above. The observed results indicated that correlation among losses is the reason for observing a large regret.

机译：强盗组合优化是一个强盗框架，其中玩家在给定的有限集A中选择一个动作{{0,1}〜d并引起所选动作的内部产品的损失和不可观察的损失矢量在每轮中的r〜d。在本文中，我们的目标是揭示该物业，这使得强盗组合优化很硬。最近，科恩等人。 [8]获得遗憾的下限ω（（DK〜3T / log T）〜（1/2）），其中k是动作向量的最大L_1-num，而T是轮数。通过考虑持续强烈相关的损失分布，实现了下限。我们的主要贡献是，通过应用（log t）〜（1/2），我们设法通过应用（log t）〜（1/2）来改善ω（（dk〜3t）〜（1/2））。这可以通过强烈的方式完成 - 具有二进制值的损耗。绑定为强盗组合优化的三个具体示例导出更好的遗憾界限：多任务强盗，强盗排名和多播匪。特别地，本研究中为强盗排名获得的束地解决了[8]中提出的打开问题。此外，我们证明问题变得更容易，而不考虑损耗向量的条目之间的相关性。实际上，如果损耗向量的每个条目是一个独立的随机变量，那么，一个人可以归因于O（（dk〜2t）〜（1/2））的后悔，这是小于的k〜（1/2）倍上面所示的下界。观察结果表明，损失之间的相关性是观察大遗憾的原因。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p11941-12726|共10页
会议地点
作者
Shinji Ito; Hanna Sumita; Takuro Fukunaga; Daisuke Hatano; Kei Takemura; Naonori Kakimura; Ken-ichi Kawarabayashi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Lower Bounds on Regret for Noisy Gaussian Process Bandit Optimization [J] . Jonathan Scarlett, Ilija Bogunovic, Volkan Cevher JMLR: Workshop and Conference Proceedings . 2017,第3期

机译：后悔的下界，用于噪声高斯过程的强盗优化
2. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting [J] . Srinivas N. Information Theory, IEEE Transactions on . 2012,第5期

机译：Bandit环境中高斯过程优化的信息理论后悔界
3. An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem [J] . Watanabe Ryo, Nakamura Atsuyoshi, Kudo Mineichi Operations Research Letters: A Journal of the Operations Research Society of America . 2015,第6期

机译：UCB类型策略对匹配选择强盗问题的预期后悔的改进上限
4. Improved Regret Bounds for Bandit Combinatorial Optimization [C] . Shinji Ito, Hanna Sumita, Takuro Fukunaga, Conference on Neural Information Processing Systems . 2020

机译：强化组合优化的改进遗憾界限
5. Extended Formulation Lower Bounds for Combinatorial Optimization [D] . Brown-Cohen, Jonah Isaac. 2018

机译：用于组合优化的扩展配方下界
6. Towards improving edge quality using combinatorial optimization and a novel skeletonize algorithm [O] . Marvin Arnold, Stefanie Speidel, Georges Hattab 2021

机译：利用组合优化和新颖骨展算法来改善边缘质量
7. First-order regret bounds for combinatorial semi-bandits [O] . Neu Gergely 2015

机译：组合半强盗的一阶遗憾

Improved Regret Bounds for Bandit Combinatorial Optimization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅