...
首页> 外文期刊>Signal Processing, IET >Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem
【24h】

Optimality of greedy policy for a class of standard reward function of restless multi-armed bandit problem

机译:一类不安定多武装匪徒问题标准奖励函数的贪心策略的最优性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

In this study, the authors consider the restless multi-armed bandit problem, which is one of the most well-studied generalisations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known to be PSPACEHard to approximate to any non-trivial factor. Thus, the optimality is very difficult to obtain because of its high complexity.Anatural method is to obtain the greedy policy considering its stability and simplicity. However, the greedy policy will result in the optimality loss for its intrinsic myopic behaviour generally. In this study, by analysing one class of so-called standard reward function, the authors establish the closed-form condition about the discounted factor ;2; such that the optimality of the greedy policy is guaranteed under the discounted expected reward criterion, especially, the condition ;2; = 1 indicating the optimality of the greedy policy under the average accumulative reward criterion. Thus, this kind of standard reward function can easily be used to judge the optimality of the greedy policy without any complicated calculation. Some examples in cognitive radio networks are presented to verify the effectiveness of the mathematical result in judging the optimality of the greedy policy.
机译:在这项研究中,作者考虑了躁动多臂的土匪问题,这是决策理论中著名的随机多臂土匪问题研究最深入的概括之一。但是,已知PSPACEHard很难近似任何非平凡的因子。因此,最优性由于其复杂性高而很难获得。一种自然的方法是考虑其稳定性和简单性来获得贪婪策略。但是,贪婪策略通常会导致其固有的近视行为丧失最优性。在这项研究中,通过分析一类所谓的标准奖励函数,作者建立了折现因子的封闭形式条件; 2;使得贪婪策略的最优性在折现的期望奖励准则下,特别是在条件下得以保证; 2; = 1表示在平均累积奖励标准下贪婪策略的最优性。因此,这种标准奖励函数可以很容易地用于判断贪婪策略的最优性,而无需进行任何复杂的计算。给出了认知无线电网络中的一些示例,以验证数学结果在判断贪婪策略的最优性方面的有效性。

著录项

  • 来源
    《Signal Processing, IET》 |2012年第6期|p.584-593|共10页
  • 作者

    Wang K.; Liu Q.; Chen L.;

  • 作者单位

    School of Information, Wuhan University of Technology, Hubei 430070, People's Republic of China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号