首页> 外文会议>IEEE Conference on Decision and Control;CDC >Optimality of myopic policy for a class of monotone affine restless multi-armed bandits
【24h】

Optimality of myopic policy for a class of monotone affine restless multi-armed bandits

机译:一类单调仿射不安多臂匪的近视策略的最优性

获取原文
获取外文期刊封面目录资料

摘要

We formulate a general class of restless multi-armed bandits with n independent and stochastically identical arms. Each arm is in a real-valued state s ∈ [s0, smax]. Selecting an arm with state s yields an immediate reward with expectation R(s). The state of the arm that is selected stochastically jumps from its current value s to either smax or s0 with probability p(s) or 1 − p(s) respectively. The state of the arms that are not selected evolve according to a function τ (s). We assume that τ (s), p(s), and R(s) are all monotonically increasing affine functions, and τ (s) is a contraction mapping. We then derive a condition on τ (s) under which the simple myopic policy, which selects at each time the arm with the highest immediate reward, is optimal. This extends and generalizes recent results in the literature pertaining to arms evolving as two-state Markov chains.
机译:我们用n个独立且随机相同的武器来制定一类一般的躁动多臂的土匪。每个手臂处于实值状态s∈[s0,smax]。选择状态为s的手臂会立即获得期望R(s)的回报。被选择的手臂状态从其当前值s随机跳跃到smax或s0,分别具有概率p(s)或1-p(s)。未选择的臂的状态根据函数τ(s)演变。我们假设τ(s),p(s)和R(s)都是单调递增的仿射函数,而τ(s)是收缩映射。然后,我们得出关于τ(s)的条件,在该条件下,每次选择具有最高立即奖励的手臂的简单近视策略是最优的。这扩展并归纳了有关武器发展为两状态马尔可夫链的文献中的最新结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号