Optimality of myopic policy for a class of monotone affine restless multi-armed bandits

机译：一类单调仿射不安多臂匪的近视策略的最优性

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We formulate a general class of restless multi-armed bandits with n independent and stochastically identical arms. Each arm is in a real-valued state s ∈ [s0, smax]. Selecting an arm with state s yields an immediate reward with expectation R(s). The state of the arm that is selected stochastically jumps from its current value s to either smax or s0 with probability p(s) or 1 − p(s) respectively. The state of the arms that are not selected evolve according to a function τ (s). We assume that τ (s), p(s), and R(s) are all monotonically increasing affine functions, and τ (s) is a contraction mapping. We then derive a condition on τ (s) under which the simple myopic policy, which selects at each time the arm with the highest immediate reward, is optimal. This extends and generalizes recent results in the literature pertaining to arms evolving as two-state Markov chains.

机译：我们用n个独立且随机相同的武器来制定一类一般的躁动多臂的土匪。每个手臂处于实值状态s∈[s0，smax]。选择状态为s的手臂会立即获得期望R（s）的回报。被选择的手臂状态从其当前值s随机跳跃到smax或s0，分别具有概率p（s）或1-p（s）。未选择的臂的状态根据函数τ（s）演变。我们假设τ（s），p（s）和R（s）都是单调递增的仿射函数，而τ（s）是收缩映射。然后，我们得出关于τ（s）的条件，在该条件下，每次选择具有最高立即奖励的手臂的简单近视策略是最优的。这扩展并归纳了有关武器发展为两状态马尔可夫链的文献中的最新结果。

著录项

来源
《IEEE Conference on Decision and Control;CDC》|2012年|p.877- 882|共6页
会议地点
作者
Mansourifard Parisa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动控制理论;自动控制理论;
关键词
Bayesian methods; Educational institutions; Indexes; Linearity; Markov processes; Switches; Vectors;

机译：贝叶斯方法;教育机构;指标;线性;马尔可夫过程;开关;向量;

相似文献

外文文献
中文文献
专利

1. On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach [J] . Wang K., Chen L. Signal Processing, IEEE Transactions on . 2012,第1期

机译：不安多臂强盗问题近视策略的最优性：公理化方法
2. Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem [J] . Wang K., Liu Q., Chen L. Signal Processing, IET . 2012,第6期

机译：一类不安定多武装匪徒问题标准奖励函数的贪心策略的最优性
3. Multi-Access Communications With Energy Harvesting: A Multi-Armed Bandit Model and the Optimality of the Myopic Policy [J] . Blasco Pol, Gunduz Deniz Selected Areas in Communications, IEEE Journal on . 2015,第3期

机译：具有能量收集功能的多路访问通信：一种多武装的强盗模型和近视策略的最优性
4. Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandits [C] . Mansourifard Parisa, Javidi Tara, Krishnamachari Bhaskar IEEE Annual Conference on Decision and Control . 2012

机译：一类单调的近视政策的最优性 - 焦点焦点多武装匪徒
5. Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics [D] . Liu, Haoyang 2013

机译：在瞬息万变的世界中学习：具有未知动态的躁动多臂强盗
6. INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS [O] . Sofía S. Villar -1

机译：一类可恢复初始化的强盗的可失性和最佳索引策略
7. Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandits [O] . Parisa Mansourifard, Tara Javidi, Bhaskar Krishnamachari 2013

机译：一类单调仿射不动武装匪徒近视策略的优化性
8. Myopic Policy for a Class of Restless Bandit Problems with Applications in Dynamic Multichannel Access [R] . Liu, K., Zhao, Q. 2009

机译：一类不安全强盗问题的近视策略及其在动态多通道接入中的应用

Optimality of myopic policy for a class of monotone affine restless multi-armed bandits

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅