On a Class of Restless Multi-armed Bandits with Deterministic Policies

机译：在一类与确定性政策的一类焦躁的多武装匪徒

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We describe and analyze a restless multi-armed bandit (RMAB) in which, in each time-slot, the instantaneous reward from the playing of an arm depends on the time since the arm was last played. This model is motivated by recommendation systems where the payoff from a recommendation on depends the recommendation history. For an RMAB with N arms, and known reward functions for each arm that have a finite support (akin to a maximum memory) of M steps, we characterize the optimal policy that maximizes the infinite horizon time-average of the reward. Specifically, using a weighted-graph representation of the system evolution, we show that a periodic policy is optimal. Further, we show that the optimal periodic policy can be obtained using an algorithm with polynomial time and space complexity. Some extensions to the basic model are also presented; several more are possible. RMABs with such large state spaces for the arms have not been previously considered.

机译：我们描述并分析了一个不安的多武装强盗（RMAB），其中，在每个时隙中，手臂播放的瞬时奖励取决于自从臂上播放以来的时间。该模型受到推荐系统的推荐系统，即建议的建议历史取决于建议历史。对于具有N个武器的RMAB，并且每个臂的已知奖励功能，每个臂具有有限的支持（类似于M个步骤的最大内存），我们表征了最佳策略，最大化无限的地平线时间平均奖励。具体地，使用系统演进的加权图表示，我们表明定期策略是最佳的。此外，我们表明可以使用具有多项式时间和空间复杂度的算法获得最佳定期策略。还提出了对基本模型的一些扩展;还有几种。以前没有考虑具有这种武器的大状态空间的RMAB。

著录项

来源
《International Conference on Signal Processing and Communications》|2018年|516p|共5页
会议地点
作者
Prakirt Raj Jhunjhunwala; Sharayu Moharir; D. Manjunath; Aditya Gopalan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN911.7-53;
关键词
Hidden Markov models; Manganese; History; Indexes; Transient analysis; Complexity theory; Markov processes;

机译：隐藏的马尔可夫模型;锰;历史;索引;瞬态分析;复杂性理论;马尔可夫进程;

相似文献

外文文献
中文文献
专利

1. Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem [J] . Wang K., Liu Q., Chen L. Signal Processing, IET . 2012,第6期

机译：一类不安定多武装匪徒问题标准奖励函数的贪心策略的最优性
2. On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach [J] . Wang K., Chen L. Signal Processing, IEEE Transactions on . 2012,第1期

机译：不安多臂强盗问题近视策略的最优性：公理化方法
3. Scheduling Periodic Real-Time Traffic in Lossy Wireless Networks as Restless Multi-Armed Bandit [J] . Jun Xu, Chengcheng Guo Wireless Communications Letters, IEEE . 2019,第4期

机译：将有损无线网络中的定期实时流量调度为躁动多臂强盗
4. On a Class of Restless Multi-armed Bandits with Deterministic Policies [C] . Prakirt Raj Jhunjhunwala, Sharayu Moharir, D. Manjunath, International Conference on Signal Processing and Communication Systems . 2018

机译：一类具有确定性策略的躁动多臂土匪
5. Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics [D] . Liu, Haoyang 2013

机译：在瞬息万变的世界中学习：具有未知动态的躁动多臂强盗
6. INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS [O] . Sofía S. Villar -1

机译：一类可恢复初始化的强盗的可失性和最佳索引策略
7. Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandits [O] . Parisa Mansourifard, Tara Javidi, Bhaskar Krishnamachari 2013

机译：一类单调仿射不动武装匪徒近视策略的优化性
8. Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit [R] . Liu, H., Liu, K., Zhao, Q. 2010

机译：在变化的世界中学习：非贝叶斯不安定的多武装强盗

On a Class of Restless Multi-armed Bandits with Deterministic Policies

摘要

著录项

相似文献

相关主题

期刊订阅