首页> 外文会议>IEEE Conference on Decision and Control;CDC >Optimality of myopic policy for a class of monotone affine restless multi-armed bandits

Optimality of myopic policy for a class of monotone affine restless multi-armed bandits




We formulate a general class of restless multi-armed bandits with n independent and stochastically identical arms. Each arm is in a real-valued state s ∈ [s0, smax]. Selecting an arm with state s yields an immediate reward with expectation R(s). The state of the arm that is selected stochastically jumps from its current value s to either smax or s0 with probability p(s) or 1 − p(s) respectively. The state of the arms that are not selected evolve according to a function τ (s). We assume that τ (s), p(s), and R(s) are all monotonically increasing affine functions, and τ (s) is a contraction mapping. We then derive a condition on τ (s) under which the simple myopic policy, which selects at each time the arm with the highest immediate reward, is optimal. This extends and generalizes recent results in the literature pertaining to arms evolving as two-state Markov chains.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号