首页> 美国卫生研究院文献>other >INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

【2h】

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

机译：一类可恢复初始化的强盗的可失性和最佳索引策略

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Motivated by a class of Partially Observable Markov Decision Processes with application in surveillance systems in which a set of imperfectly observed state processes is to be inferred from a subset of available observations through a Bayesian approach, we formulate and analyze a special family of multi-armed restless bandit problems. We consider the problem of finding an optimal policy for observing the processes that maximizes the total expected net rewards over an infinite time horizon subject to the resource availability. From the Lagrangian relaxation of the original problem, an index policy can be derived, as long as the existence of the Whittle index is ensured. We demonstrate that such a class of reinitializing bandits in which the projects’ state deteriorates while active and resets to its initial state when passive until its completion possesses the structural property of indexability and we further show how to compute the index in closed form. In general, the Whittle index rule for restless bandit problems does not achieve optimality. However, we show that the proposed Whittle index rule is optimal for the problem under study in the case of stochastically heterogenous arms under the expected total criterion, and it is further recovered by a simple tractable rule referred to as the 1-limited Round Robin rule. Moreover, we illustrate the significant suboptimality of other widely used heuristic: the Myopic index rule, by computing in closed form its suboptimality gap. We present numerical studies which illustrate for the more general instances the performance advantages of the Whittle index rule over other simple heuristics.

机译：受一类部分可观察的马尔可夫决策过程及其在监视系统中的应用的启发，在监视系统中，将通过贝叶斯方法从可用观察的子集中推断出一组不完全观察到的状态过程，我们制定并分析了一个特殊的多臂族不安的土匪问题。我们考虑到一个问题，即寻找一个最佳的策略来观察流程，该流程可以在无限的时间内视资源可用性而最大化总的预期净回报。从原始问题的拉格朗日松弛中，只要可以确保Whittle索引的存在，就可以导出索引策略。我们证明了这类重新初始化的强盗，其中项目的状态在主动状态下会变差，而在被动状态下会重置为其初始状态，直到完成为止具有可索引性的结构性，并且我们还将展示如何以封闭形式计算索引。通常，用于躁动不安的土匪问题的Whittle指数规则无法达到最优。但是，我们表明，在预期的总准则下，随机异类武器的情况下，拟议的Whittle指数规则对于研究中的问题是最佳的，并且可以通过称为1限循环法则的简单易处理规则进一步加以恢复。。此外，我们通过封闭形式计算其次优差距，说明了其他广泛使用的启发式算法的显着次优性。我们提供的数值研究表明，对于更一般的实例，Whittle索引规则相对于其他简单启发式算法的性能优势。

著录项

期刊名称 other
作者
Sofía S. Villar;
展开▼
作者单位

展开▼
年(卷),期 -1(30),1
年度 -1
页码 1–23
总页数 30
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. ASYMPTOTICALLY OPTIMAL PRIORITY POLICIES FOR INDEXABLE AND NONINDEXABLE RESTLESS BANDITS [J] . Verloop I. M. The Annals of applied probability: an official journal of the Institute of Mathematical Statistics . 2016,第4期

机译：不可分割和不可分割的强盗的渐近最优优先级策略
2. Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem [J] . Wang K., Liu Q., Chen L. Signal Processing, IET . 2012,第6期

机译：一类不安定多武装匪徒问题标准奖励函数的贪心策略的最优性
3. Optimal Policies for a Class of Restless Multiarmed Bandit Scheduling Problems with Applications to Sensor Management [J] . R. Washburn, M. Schneider Journal of Advances in Information Fusion . 2008,第1期

机译：一类不安定多臂土匪调度问题的最优策略及其在传感器管理中的应用
4. Optimality of myopic policy for a class of monotone affine restless multi-armed bandits [C] . Mansourifard Parisa IEEE Conference on Decision and Control;CDC . 2012

机译：一类单调仿射不安多臂匪的近视策略的最优性
5. Stochastic optimization over parallel queues: Channel-blind scheduling, restless bandit, and optimal delay. [D] . Li, Chih-ping. 2011

机译：并行队列上的随机优化：信道盲调度，躁动的匪徒和最佳延迟。
6. Nash Equilibrium of Social-Learning Agents in a Restless Multiarmed Bandit Game [O] . Kazuaki Nakayama, Masato Hisakado, Shintaro Mori -1

机译：躁动多臂强盗游戏中的社会学习代理人的纳什均衡
7. INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS [O] . Sofía S. Villar 2015

机译：一类重新初始化匪盗的可责任性和最佳指标策略
8. Myopic Policy for a Class of Restless Bandit Problems with Applications in Dynamic Multichannel Access [R] . Liu, K., Zhao, Q. 2009

机译：一类不安全强盗问题的近视策略及其在动态多通道接入中的应用

INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS

摘要

著录项

相似文献

相关主题

期刊订阅