Regret bounds for restless Markov bandits

Ronald Ortner; Daniil Ryabko; Peter Auer; Remi Munos

首页> 外文期刊>Theoretical computer science >Regret bounds for restless Markov bandits

【24h】

Regret bounds for restless Markov bandits

机译：焦躁不安的马尔科夫匪徒

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the restless Markov bandit problem, in which the state of each arm evolves according to a Markov process independently of the learner's actions. We suggest an algorithm, that first represents the setting as an MDP which exhibits some special structural properties. In order to grasp this information we introduce the notion of ε-structured MDPs, which are a generalization of concepts like (approximate) state aggregation and MDP homomorphisms. We propose a general algorithm for learning ε-structured MDPs and show regret bounds that demonstrate that additional structural information enhances learning. Applied to the restless bandit setting, this algorithm achieves after any T steps regret of order O({the square root of}T) with respect to the best policy that knows the distributions of all arms. We make no assumptions on the Markov chains underlying each arm except that they are irreducible. In addition, we show that index-based policies are necessarily suboptimal for the considered problem.

机译：我们考虑了躁动不安的马尔可夫强盗问题，其中每个臂的状态根据马尔可夫过程演化而独立于学习者的行为。我们建议一种算法，该算法首先将设置表示为具有某些特殊结构属性的MDP。为了掌握这些信息，我们介绍了ε结构MDP的概念，它是对（近似）状态聚合和MDP同态的概念的概括。我们提出了一种学习ε-结构化MDP的通用算法，并展示了后悔界限，证明了额外的结构信息可增强学习效果。应用于不安定的匪徒设置，该算法在知道所有武器分配的最佳策略的任何T步后阶O（{T的平方根）后悔之后都可以实现。我们对每个臂下的马尔可夫链没有任何假设，除非它们是不可约的。此外，我们表明，对于所考虑的问题，基于索引的策略不一定是次优的。

著录项

来源
《Theoretical computer science》 |2014年第null期|共15页
作者
Ronald Ortner; Daniil Ryabko; Peter Auer; Remi Munos;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类一般性问题;数学;
关键词
Restless bandits; Markov decision processes; Regret;

机译：躁动的土匪;马尔可夫决策过程;遗憾;

相似文献

外文文献
中文文献
专利

1. Regret bounds for restless Markov bandits [J] . Ronald Ortner, Daniil Ryabko, Peter Auer, Theoretical computer science . 2014,第Null期

机译：焦躁不安的马尔科夫匪徒
2. Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits [J] . Zeyuan Allen-Zhu, Sebastien Bubeck, Yuanzhi Li JMLR: Workshop and Conference Proceedings . 2018,第3期

机译：让少数族裔再次伟大：情境强盗的一阶遗憾
3. Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits [J] . Zeyuan Allen-Zhu, Sebastien Bubeck, Yuanzhi Li JMLR: Workshop and Conference Proceedings . 2018,第3期

机译：让少数族裔再次伟大：情境强盗的一阶遗憾
4. Regret Bounds for Restless Markov Bandits [C] . Ronald Ortner, Daniil Ryabko, Peter Auer, International conference on algorithmic learning theory . 2012

机译：躁动不安的马尔科夫匪徒的后悔界限
5. Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics [D] . Liu, Haoyang 2013

机译：在瞬息万变的世界中学习：具有未知动态的躁动多臂强盗
6. Nash Equilibrium of Social-Learning Agents in a Restless Multiarmed Bandit Game [O] . Kazuaki Nakayama, Masato Hisakado, Shintaro Mori -1

机译：躁动多臂强盗游戏中的社会学习代理人的纳什均衡
7. Regret Bounds for Restless Markov Bandits [O] . 2015

机译：不安马尔可夫匪徒的遗憾

Regret bounds for restless Markov bandits

摘要

著录项

相似文献

相关主题

期刊订阅