Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem

Wang K.; Liu Q.; Chen L.

首页> 外文期刊>Signal Processing, IET >Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem

【24h】

Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem

机译：一类不安定多武装匪徒问题标准奖励函数的贪心策略的最优性

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this study, the authors consider the restless multi-armed bandit problem, which is one of the most well-studied generalisations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known to be PSPACEHard to approximate to any non-trivial factor. Thus, the optimality is very difficult to obtain because of its high complexity.Anatural method is to obtain the greedy policy considering its stability and simplicity. However, the greedy policy will result in the optimality loss for its intrinsic myopic behaviour generally. In this study, by analysing one class of so-called standard reward function, the authors establish the closed-form condition about the discounted factor ;2; such that the optimality of the greedy policy is guaranteed under the discounted expected reward criterion, especially, the condition ;2; = 1 indicating the optimality of the greedy policy under the average accumulative reward criterion. Thus, this kind of standard reward function can easily be used to judge the optimality of the greedy policy without any complicated calculation. Some examples in cognitive radio networks are presented to verify the effectiveness of the mathematical result in judging the optimality of the greedy policy.

机译：在这项研究中，作者考虑了躁动多臂的土匪问题，这是决策理论中著名的随机多臂土匪问题研究最深入的概括之一。但是，已知PSPACEHard很难近似任何非平凡的因子。因此，最优性由于其复杂性高而很难获得。一种自然的方法是考虑其稳定性和简单性来获得贪婪策略。但是，贪婪策略通常会导致其固有的近视行为丧失最优性。在这项研究中，通过分析一类所谓的标准奖励函数，作者建立了折现因子的封闭形式条件； 2；使得贪婪策略的最优性在折现的期望奖励准则下，特别是在条件下得以保证； 2； = 1表示在平均累积奖励标准下贪婪策略的最优性。因此，这种标准奖励函数可以很容易地用于判断贪婪策略的最优性，而无需进行任何复杂的计算。给出了认知无线电网络中的一些示例，以验证数学结果在判断贪婪策略的最优性方面的有效性。

著录项

来源
《Signal Processing, IET》 |2012年第6期|p.584-593|共10页
作者
Wang K.; Liu Q.; Chen L.;
展开▼
作者单位

School of Information, Wuhan University of Technology, Hubei 430070, People's Republic of China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach [J] . Wang K., Chen L. Signal Processing, IEEE Transactions on . 2012,第1期

机译：不安多臂强盗问题近视策略的最优性：公理化方法
2. ON THE ASYMPTOTIC OPTIMALITY OF GREEDY INDEX HEURISTICS FOR MULTI-ACTION RESTLESS BANDITS [J] . Hodge D. J., Glazebrook K. D. Advances in applied probability . 2015,第3期

机译：多动静力土的贪婪指数启发式的渐近最优性
3. AN ASYMPTOTICALLY OPTIMAL HEURISTIC FOR GENERAL NONSTATIONARY FINITE-HORIZON RESTLESS MULTI-ARMED, MULTI-ACTION BANDITS [J] . Zayas-Caban Gabriel, Jasin Stefanus, Wang Guihua Advances in applied probability . 2019,第3期

机译：一般非平稳有限范围不安的多武装，多动作匪徒的渐近最优启发式
4. Optimality of myopic policy for a class of monotone affine restless multi-armed bandits [C] . Mansourifard Parisa IEEE Conference on Decision and Control;CDC . 2012

机译：一类单调仿射不安多臂匪的近视策略的最优性
5. Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics [D] . Liu, Haoyang 2013

机译：在瞬息万变的世界中学习：具有未知动态的躁动多臂强盗
6. INDEXABILITY AND OPTIMAL INDEX POLICIES FOR A CLASS OF REINITIALISING RESTLESS BANDITS [O] . Sofía S. Villar -1

机译：一类可恢复初始化的强盗的可失性和最佳索引策略
7. Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem [O] . K. Wang, Q. Liu, L. Chen 2012

机译：一类难以置信的多武装匪徒问题一类标准奖励功能的贪婪政策的最优性
8. Learning in A Changing World: Non-Bayesian Restless Multi-Armed Bandit [R] . Liu, H., Liu, K., Zhao, Q. 2010

机译：在变化的世界中学习：非贝叶斯不安定的多武装强盗

Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem

摘要

著录项

相似文献

相关主题

期刊订阅