Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics

Liu H.; Liu K.; Zhao Q.

首页> 外文期刊>Information Theory, IEEE Transactions on >Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics

【24h】

Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics

机译：在瞬息万变的世界中学习：具有未知动态的躁动多臂匪徒

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider the restless multiarmed bandit problem with unknown dynamics in which a player chooses one out of $N$ arms to play at each time. The reward state of each arm transits according to an unknown Markovian rule when it is played and evolves according to an arbitrary unknown random process when it is passive. The performance of an arm selection policy is measured by regret, defined as the reward loss with respect to the case where the player knows which arm is the most rewarding and always plays the best arm. We construct a policy with an interleaving exploration and exploitation epoch structure that achieves a regret with logarithmic order. We further extend the problem to a decentralized setting where multiple distributed players share the arms without information exchange. Under both an exogenous restless model and an endogenous restless model, we show that a decentralized extension of the proposed policy preserves the logarithmic regret order as in the centralized setting. The results apply to adaptive learning in various dynamic systems and communication networks, as well as financial investment.

机译：我们考虑具有未知动态的不安定多臂匪问题，其中玩家每次选择从$ N $武器中选择一个。演奏时，每条手臂的奖励状态都会根据未知的马尔可夫规则进行转换，而在被动时，则会根据任意未知的随机过程进行演化。手臂选择策略的执行情况通过遗憾来衡量，后者定义为相对于玩家知道哪条手臂是最有价值的并且始终使用最佳手臂的情况下的奖励损失。我们构建了一个具有交错的勘探和开发时代结构的政策，该政策以对数顺序实现了遗憾。我们将问题进一步扩展到分散的环境，其中多个分布式参与者共享武器而无需信息交换。在外生不安定模型和内生不安定模型下，我们都表明，拟议政策的分散扩展保留了集中式设置中的对数后悔顺序。结果适用于各种动态系统和通信网络中的自适应学习，以及金融投资。

著录项

来源
《Information Theory, IEEE Transactions on 》 |2013年第3期| p.1902-1916| 共15页
作者
Liu H.; Liu K.; Zhao Q.;
展开▼
作者单位

Electrical and Computer Engineering, UC Davis, Davis, United States;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayesian methods; Fading; Indexes; Loss measurement; Markov processes; Random processes; Transient analysis; Distributed learning; online learning; regret; restless multiarmed bandit (RMAB);

机译：贝叶斯方法;衰退;索引;损失计量;马尔可夫过程;随机过程;瞬态分析;分布式学习;在线学习;后悔;不安定多臂匪（RMAB）;

相似文献

外文文献
中文文献
专利

1. Optimal learning dynamics of multiagent system in restless multiarmed bandit game [J] . Physica, A. Statistical mechanics and its applications . 2020 ,第期

机译：焦急化多导体匪徒游戏中多层系统的最佳学习动态
2. Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management [J] . R. Washburn, M. Schneider Journal of Advances in Information Fusion . 2008 ,第1期

机译：一类不安定多臂土匪调度问题的最优策略及其在传感器管理中的应用
3. Privacy-Preserving Collaborative Learning for Multiarmed Bandits in IoT [J] . Chen Shuzhen, Tao Youming, Yu Dongxiao, Internet of Things Journal, IEEE . 2021 ,第5期

机译：IOT中的多神匪徒的隐私合作学习
4. Femtocell Scheduling as a Restless Multiarmed Bandit Problem Using Partial Channel State Observation [C] . Hesham M. Elmaghraby, Keqin Liu, Zhi Ding IEEE International Conference on Communications . 2018

机译：Femtocell调度作为使用部分信道状态观察的不安多臂强盗问题
5. Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics [D] . Liu, Haoyang 2013

机译：在瞬息万变的世界中学习：具有未知动态的躁动多臂强盗
6. Nash Equilibrium of Social-Learning Agents in a Restless Multiarmed Bandit Game [O] . Kazuaki Nakayama, Masato Hisakado, Shintaro Mori -1

机译：躁动多臂强盗游戏中的社会学习代理人的纳什均衡
7. Optimal learning dynamics of multiagent system in restless multiarmed bandit game [O] . Kazuaki Nakayama, Ryuzo Nakamura, Masato Hisakado, 2020

机译：焦急化多导体匪徒游戏中多层系统的最佳学习动态
8. Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit [R] . Liu, H., Liu, K., Zhao, Q. 2010

机译：在变化的世界中学习：非贝叶斯不安定的多武装强盗

Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics

摘要

著录项

相似文献

相关主题

期刊订阅