UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Peter Auer; Ronald Ortner

首页> 外文期刊>Periodica Mathematica Hungarica >UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

【24h】

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

机译：重新讨论了UCB：改进了随机多武装匪徒问题的后悔界限

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the stochastic multi-armed bandit problem we consider a modification of the UCB algorithm of Auer et al. [4]. For this modified algorithm we give an improved bound on the regret with respect to the optimal reward. While for the original UCB algorithm the regret in K-armed bandits after T trials is bounded by const · #xA; #xA;nfrac{{Klog (T)}}n{Delta }n, where Δ measures the distance between a suboptimal arm and the optimal arm, for the modified UCB algorithm we show an upper bound on the regret of const · $nfrac{{Klog (TDelta ^2 )}}n{Delta }n$nfrac{{Klog (TDelta ^2 )}}n{Delta } .

机译：在随机多武装匪徒问题中，我们考虑对Auer等人的UCB算法进行修改。 [4]。对于这种改进的算法，我们给出了关于最佳奖励的遗憾的改进界限。对于原始的UCB算法，T试验后K武装匪徒的遗憾受到const·#xA的限制； #xA; nfrac {{Klog（T）}} n {Delta} n，其中Δ衡量次优臂与最佳臂之间的距离，对于改进的UCB算法，我们在const·$ nfrac的遗憾上显示了上限{{Klog（TDelta ^ 2）}} n {Delta} n $ nfrac {{Klog（TDelta ^ 2）}} n {Delta}。

著录项

来源
《Periodica Mathematica Hungarica》 |2010年第2期|p.55-65|共11页
作者
Peter Auer; Ronald Ortner;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem [J] . Auer P., Ortner R. Periodica Mathematica Hungarica: Journal of the Janos Bolyai Mathematical Society . 2010,第1a2期

机译：重新讨论了UCB：改进了随机多武装匪徒问题的后悔界限
2. An improved upper bound on the expected regret of UCB-type policies for a matching-selection bandit problem [J] . Watanabe Ryo, Nakamura Atsuyoshi, Kudo Mineichi Operations Research Letters: A Journal of the Operations Research Society of America . 2015,第6期

机译：UCB类型策略对匹配选择强盗问题的预期后悔的改进上限
3. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J] . Sebastien Bubeck, Nicolo Cesa-Bianchi Foundations and trends in machine learning . 2012,第1期

机译：随机和非随机多臂匪问题的遗憾分析
4. Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory [C] . Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan AAAI Conference on Artificial Intelligence . 2020

机译：使用有限臂记忆的多武装匪徒的遗憾最小化
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. An Analysis of the Value of Information When Exploring Stochastic Discrete Multi-Armed Bandits [O] . Isaac J. Sledge, José C. Príncipe 2018

机译：探索随机离散多武装匪徒信息的价值分析
7. UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM [O] . Peter Auer, Ronald Ortner 2010

机译：UCB重新审视：为随机多臂带状问题改进了REGRET界限

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

摘要

著录项

相似文献

相关主题

期刊订阅