Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

机译：汤普森在情节不安的强盗问题中对汤普森采样的遗憾

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Restless bandit problems are instances of non-stationary multi-armed bandits. These problems have been studied well from the optimization perspective, where the goal is to efficiently find a near-optimal policy when system parameters are known. However, very few papers adopt a learning perspective, where the parameters are unknown. In this paper, we analyze the performance of Thompson sampling in episodic restless bandits with unknown parameters. We consider a general policy map to define our competitor and prove an O(T~(1/2)) Bayesian regret bound. Our competitor is flexible enough to represent various benchmarks including the best fixed action policy, the optimal policy, the Whittle index policy, or the myopic policy. We also present empirical results that support our theoretical findings.

机译：不安的强盗问题是非静止多武装匪徒的情况。从优化的角度来看，这些问题已经很好地研究了这些问题，其中目标是在已知系统参数时有效地找到近最佳政策。然而，很少有论文采用学习的角度，参数未知。在本文中，我们分析了汤普森采样在具有未知参数中的情节不安匪中的性能。我们考虑一般策略地图来定义我们的竞争对手，并证明o（t〜（1/2））贝叶斯遗憾。我们的竞争对手足够灵活，可以代表各种基准，包括最佳固定行动政策，最佳政策，薄片指数政策或近视政策。我们还提出了支持我们理论发现的实证结果。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p8761-9553|共10页
会议地点
作者
Young Hun Jung; Ambuj Tewari;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词
入库时间 2022-08-21 10:48:03

相似文献

外文文献
中文文献
专利

1. Regret bounds for restless Markov bandits [J] . Ronald Ortner, Daniil Ryabko, Peter Auer, Theoretical computer science . 2014,第Null期

机译：焦躁不安的马尔科夫匪徒
2. Bandit Algorithms Based on Thompson Sampling for Bounded Reward Distributions [J] . Charles Riou, Junya Honda JMLR: Workshop and Conference Proceedings . 2020,第2期

机译：基于汤普森采样的有界奖励分布的强盗算法
3. Self-accelerated Thompson sampling with near-optimal regret upper bound [J] . Zhu Zhenyu, Huang Liusheng, Xu Hongli Neurocomputing . 2020,第Jul25期

机译：自加速汤普森采样，近乎最佳遗憾的上限
4. Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems [C] . Young Hun Jung, Ambuj Tewari Conference on Neural Information Processing Systems . 2020

机译：汤普森在情节不安的强盗问题中对汤普森采样的遗憾
5. Thompson Sampling for Bandit Problems [D] . Liu, Che-Yu. 2018

机译：汤普森抽样问题
6. Nash Equilibrium of Social-Learning Agents in a Restless Multiarmed Bandit Game [O] . Kazuaki Nakayama, Masato Hisakado, Shintaro Mori -1

机译：躁动多臂强盗游戏中的社会学习代理人的纳什均衡
7. Regret Bounds for Restless Markov Bandits [O] . 2015

机译：不安马尔可夫匪徒的遗憾

Regret Bounds for Thompson Sampling in Episodic Restless Bandit Problems

摘要

著录项

相似文献

相关主题

期刊订阅