UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Auer P.; Ortner R.

首页> 外文期刊>Periodica Mathematica Hungarica: Journal of the Janos Bolyai Mathematical Society >UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

【24h】

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

机译：重新讨论了UCB：改进了随机多武装匪徒问题的后悔界限

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const, where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const.

机译：在随机多武装匪徒问题中，我们考虑对Auer等人的UCB算法进行修改。 [4]。对于这种改进的算法，我们给出了关于最佳奖励的遗憾的改进界限。对于原始的UCB算法，T试验后K武装匪徒的遗憾受到const的限制，其中Δ衡量次优手臂与最优手臂之间的距离，而对于改进的UCB算法，我们在const的遗憾上显示了上限。

著录项

来源
《Periodica Mathematica Hungarica: Journal of the Janos Bolyai Mathematical Society》 |2010年第2期|共11页
作者
Auer P.; Ortner R.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类数学;
关键词
multi-armed bandit problem; regret;

机译：多臂匪徒问题;后悔;

相似文献

外文文献
中文文献
专利

1. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem [J] . Peter Auer, Ronald Ortner Periodica Mathematica Hungarica . 2010,第s1a2期

机译：重新讨论了UCB：改进了随机多武装匪徒问题的后悔界限
2. An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem [J] . Watanabe Ryo, Nakamura Atsuyoshi, Kudo Mineichi Operations Research Letters: A Journal of the Operations Research Society of America . 2015,第6期

机译：UCB类型策略对匹配选择强盗问题的预期后悔的改进上限
3. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J] . Sebastien Bubeck, Nicolo Cesa-Bianchi Foundations and trends in machine learning . 2012,第1期

机译：随机和非随机多臂匪问题的遗憾分析
4. Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory [C] . Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan AAAI Conference on Artificial Intelligence . 2020

机译：使用有限臂记忆的多武装匪徒的遗憾最小化
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. An Analysis of the Value of Information When Exploring Stochastic Discrete Multi-Armed Bandits [O] . Isaac J. Sledge, José C. Príncipe 2018

机译：探索随机离散多武装匪徒信息的价值分析
7. UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM [O] . Peter Auer, Ronald Ortner 2010

机译：UCB重新审视：为随机多臂带状问题改进了REGRET界限

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

摘要

著录项

相似文献

相关主题

期刊订阅