Time-varying stochastic multi-armed bandit problems

机译：时变随机多臂匪问题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.

机译：在本文中，我们考虑了随时间变化的随机多臂土匪（MAB）问题，其中各臂的未知奖励分配可以随时间任意变化。我们获得后悔顺序的下限，并证明在线学习算法可以达到此下限。我们进一步考虑手臂奖励分布的分段固定模型，并根据时间范围内奖励分布所经历的变化点的数量来建立在线学习算法的遗憾性能。

著录项

来源
《Asilomar Conference on Signals, Systems Computers》|2014年|2103-2107|共5页
会议地点
作者
Vakili Sattar; Qing Zhao; Yuan Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J] . Sebastien Bubeck, Nicolo Cesa-Bianchi Foundations and trends in machine learning . 2012,第1期

机译：随机和非随机多臂匪问题的遗憾分析
2. Learning by Repetition Stochastic Multi-armed Bandits under Priming Effect [J] . Priyank Agrawal, Theja Tulabandula JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：在启动效果下重复随机多武装匪徒学习
3. Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without [J] . Sébastien Bubeck, Yuanzhi Li, Yuval Peres, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：非随机多人多武装匪：碰撞信息的最佳速率，载于载重率
4. Time-varying stochastic multi-armed bandit problems [C] . Vakili Sattar, Qing Zhao, Yuan Zhou Asilomar Conference on Signals, Systems and Computers . 2014

机译：时变随机多武装强盗问题
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. An Analysis of the Value of Information When Exploring Stochastic Discrete Multi-Armed Bandits [O] . Isaac J. Sledge, José C. Príncipe 2018

机译：探索随机离散多武装匪徒信息的价值分析
7. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [O] . 2016

机译：随机和非随机多臂强盗问题的后悔分析

Time-varying stochastic multi-armed bandit problems

摘要

著录项

相似文献

相关主题

期刊订阅