首页> 外文会议>Machine learning >Finite-time Regret Bounds for the Multiarmed Bandit Problem

【24h】

Finite-time Regret Bounds for the Multiarmed Bandit Problem

机译：多臂强盗问题的有限时间后悔界限

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We show finite-time regret bounds for the multiarmed bandit problem under the assumption that all rewards come from a bounded and fixed range. Our regret bounds after any number T of pulls are of the form a+b log T+c log~2 T, where a, b, and c are positive constants not depending on T. These bounds are shown to hold for variants of the popular e-greedy and Boltzmann allocation rules, and for a new simple deterministic allocation rule. Moreover, our results also apply to an extension of the basic bandit problem in which reward distributions can depend, to some extent, from previous pulls and observed rewards. Finally, we discuss the empirical performance of our algorithms with respect to specific choices of the reward distributions.

机译：在所有奖励均来自有界且固定范围的假设下，我们显示了多臂匪徒问题的有限时间后悔界限。在任意次数的拉后，我们的后悔界限的形式为a + b log T + c log〜2 T，其中a，b和c是不依赖于T的正常数。这些界限被证明适用于流行的e-greedy和Boltzmann分配规则，以及一种新的简单确定性分配规则。此外，我们的结果还适用于基本匪徒问题的扩展，其中奖励分配在一定程度上取决于先前的拉动和观察到的奖励。最后，我们讨论了关于奖励分配的特定选择的算法的经验性能。

著录项

来源
《Machine learning》|1998年|100-108|共9页
会议地点 Madison WI(US);Madison WI(US)
作者
Nicolo Cesa-Bianchi; Paul Fischer;
展开▼
作者单位

DSI, University of Milan via Comelico 39, I-20315 Milano, Italy;

Lehrstuhl Informatik II Universitaet Dortmund D-44221 Dortmund, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机的应用;
关键词

相似文献

外文文献
中文文献
专利

1. Finite-time Analysis of the Multiarmed Bandit Problem [J] . Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer Machine Learning . 2002,第2a3期

机译：多臂强盗问题的有限时间分析
2. Smoking and the bandit: A preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. [J] . Merideth A Addicott, John M Pearson, Jessica Wilson, Experimental and clinical psychopharmacology . 2013,第1期

机译：吸烟与强盗：对吸烟者和非吸烟者探索性行为的差异的初步研究，该差异是通过多臂匪徒任务测得的。
3. Smoking and the bandit: A preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. [J] . Merideth A Addicott, John M Pearson, Jessica Wilson, Experimental and clinical psychopharmacology . 2013,第1期

机译：吸烟和强盗：用多道主义匪徒任务测量的探索性行为的吸烟者和非主持人差异的初步研究。
4. Regret Bounds for Safe Gaussian Process Bandit Optimization [C] . Sanae Amani, Mahnoosh Alizadeh, Christos Thrampoulidis IEEE International Symposium on Information Theory . 2021

机译：安全高斯过程强盗优化的遗憾界限
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. Nash Equilibrium of Social-Learning Agents in a Restless Multiarmed Bandit Game [O] . Kazuaki Nakayama, Masato Hisakado, Shintaro Mori -1

机译：躁动多臂强盗游戏中的社会学习代理人的纳什均衡
7. Finite-time Regret Bound of a Bandit Algorithm for the Semi-bounded Support Model [O] . Honda, Junya, Takemura, Akimichi 2012

机译：半有界网络带状算法的有限时间后悔支持模型

Finite-time Regret Bounds for the Multiarmed Bandit Problem

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅