...
首页> 外文期刊>Probability in the Engineering and Informational Sciences >ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS
【24h】

ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS

机译:多武装匪徒知识梯度策略中的弱点识别与缓解

获取原文
获取原文并翻译 | 示例
           

摘要

The knowledge gradient (KG) policy was originally proposed for online ranking and selection problems but has recently been adapted for use in online decision-making in general and multi-armed bandit problems (MABs) in particular. We study its use in a class of exponential family MABs and identify weaknesses, including a propensity to take actions which are dominated with respect to both exploitation and exploration. We propose variants of KG which avoid such errors. These new policies include an index heuristic, which deploys a KG approach to develop an approximation to the Gittins index. A numerical study shows this policy to perform well over a range of MABs including those for which index policies are not optimal. While KG does not take dominated actions when bandits are Gaussian, it fails to be index consistent and appears not to enjoy a performance advantage over competitor policies when arms are correlated to compensate for its greater computational demands.
机译:知识梯度(KG)策略最初是针对在线排名和选择问题而提出的,但最近已被调整用于一般和多武装匪徒问题(MAB)的在线决策。我们研究了其在指数型家庭人与生物圈中的用途,并确定了弱点,包括倾向于采取在开采和勘探方面占主导地位的行动。我们提出了避免此类错误的KG变体。这些新策略包括索引启发式,它采用了KG方法来近似Gittins索引。数值研究表明,该策略在一系列MAB(包括那些指标策略不是最佳的MAB)中表现良好。尽管在高斯匪盗时KG不会采取主导行动,但当关联武器以补偿其更大的计算需求时,KG不能保持指数一致,并且似乎不享有竞争者策略的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号