首页> 美国卫生研究院文献>other >INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS
【2h】

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

机译:一类可恢复初始化的强盗的可失性和最佳索引策略

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Motivated by a class of Partially Observable Markov Decision Processes with application in surveillance systems in which a set of imperfectly observed state processes is to be inferred from a subset of available observations through a Bayesian approach, we formulate and analyze a special family of multi-armed restless bandit problems. We consider the problem of finding an optimal policy for observing the processes that maximizes the total expected net rewards over an infinite time horizon subject to the resource availability. From the Lagrangian relaxation of the original problem, an index policy can be derived, as long as the existence of the Whittle index is ensured. We demonstrate that such a class of reinitializing bandits in which the projects’ state deteriorates while active and resets to its initial state when passive until its completion possesses the structural property of indexability and we further show how to compute the index in closed form. In general, the Whittle index rule for restless bandit problems does not achieve optimality. However, we show that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the 1-limited Round Robin rule. Moreover, we illustrate the significant suboptimality of other widely used heuristic: the Myopic index rule, by computing in closed form its suboptimality gap. We present numerical studies which illustrate for the more general instances the performance advantages of the Whittle index rule over other simple heuristics.
机译:受一类部分可观察的马尔可夫决策过程及其在监视系统中的应用的启发,在监视系统中,将通过贝叶斯方法从可用观察的子集中推断出一组不完全观察到的状态过程,我们制定并分析了一个特殊的多臂族不安的土匪问题。我们考虑到一个问题,即寻找一个最佳的策略来观察流程,该流程可以在无限的时间内视资源可用性而最大化总的预期净回报。从原始问题的拉格朗日松弛中,只要可以确保Whittle索引的存在,就可以导出索引策略。我们证明了这类重新初始化的强盗,其中项目的状态在主动状态下会变差,而在被动状态下会重置为其初始状态,直到完成为止具有可索引性的结构性,并且我们还将展示如何以封闭形式计算索引。通常,用于躁动不安的土匪问题的Whittle指数规则无法达到最优。但是,我们表明,在预期的总准则下,随机异类武器的情况下,拟议的Whittle指数规则对于研究中的问题是最佳的,并且可以通过称为1限循环法则的简单易处理规则进一步加以恢复。 。此外,我们通过封闭形式计算其次优差距,说明了其他广泛使用的启发式算法的显着次优性。我们提供的数值研究表明,对于更一般的实例,Whittle索引规则相对于其他简单启发式算法的性能优势。

著录项

  • 期刊名称 other
  • 作者

    Sofía S. Villar;

  • 作者单位
  • 年(卷),期 -1(30),1
  • 年度 -1
  • 页码 1–23
  • 总页数 30
  • 原文格式 PDF
  • 正文语种
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号