ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS

Edwards James; Fearnhead Paul; Glazebrook Kevin

首页> 外文期刊>Probability in the Engineering and Informational Sciences >ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS

【24h】

ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS

机译：多武装匪徒知识梯度策略中的弱点识别与缓解

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The knowledge gradient (KG) policy was originally proposed for online ranking and selection problems but has recently been adapted for use in online decision-making in general and multi-armed bandit problems (MABs) in particular. We study its use in a class of exponential family MABs and identify weaknesses, including a propensity to take actions which are dominated with respect to both exploitation and exploration. We propose variants of KG which avoid such errors. These new policies include an index heuristic, which deploys a KG approach to develop an approximation to the Gittins index. A numerical study shows this policy to perform well over a range of MABs including those for which index policies are not optimal. While KG does not take dominated actions when bandits are Gaussian, it fails to be index consistent and appears not to enjoy a performance advantage over competitor policies when arms are correlated to compensate for its greater computational demands.

机译：知识梯度（KG）策略最初是针对在线排名和选择问题而提出的，但最近已被调整用于一般和多武装匪徒问题（MAB）的在线决策。我们研究了其在指数型家庭人与生物圈中的用途，并确定了弱点，包括倾向于采取在开采和勘探方面占主导地位的行动。我们提出了避免此类错误的KG变体。这些新策略包括索引启发式，它采用了KG方法来近似Gittins索引。数值研究表明，该策略在一系列MAB（包括那些指标策略不是最佳的MAB）中表现良好。尽管在高斯匪盗时KG不会采取主导行动，但当关联武器以补偿其更大的计算需求时，KG不能保持指数一致，并且似乎不享有竞争者策略的性能优势。

著录项

来源
《Probability in the Engineering and Informational Sciences》 |2017年第2期|239-263|共25页
作者
Edwards James; Fearnhead Paul; Glazebrook Kevin;
展开▼
作者单位

Univ Lancaster, STOR I Ctr Doctoral Training, Lancaster LA1 4YF, England;

Univ Lancaster, Dept Math & Stat, Lancaster LA1 4YF, England;

Univ Lancaster, Dept Management Sci, Lancaster LA1 4YX, England;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
stochastic dynamic programming;

机译：随机动态规划;

相似文献

外文文献
中文文献
专利

1. ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT [J] . Burnetas Apostolos, Kanavetas Odysseas, Katehakis Michael N. Probability in the Engineering and Informational Sciences . 2017,第3期

机译：成本约束下的渐近最优多武装强盗政策
2. Multi-Access Communications With Energy Harvesting: A Multi-Armed Bandit Model and the Optimality of the Myopic Policy [J] . Blasco Pol, Gunduz Deniz Selected Areas in Communications, IEEE Journal on . 2015,第3期

机译：具有能量收集功能的多路访问通信：一种多武装的强盗模型和近视策略的最优性
3. Lower Bounds and Selectivity of Weak-Consistent Policies in Stochastic Multi-Armed Bandit Problem [J] . Salomon Antoine, Audibert Jean-Yves, Alaoui Issam El Journal of machine learning research . 2013,第Jan期

机译：随机多武装强盗问题的弱一致性策略的下界和选择性
4. Linear Scalarized Knowledge Gradient in the Multi-Objective Multi-Armed Bandits Problem [C] . Saba Yahyaa, Madalina M. Drugan, Bernard Manderick European symposium on artificial neural networks, computational intelligence and machine learning . 2014

机译：多目标多臂匪问题中的线性标量知识梯度
5. Essays on sequential analysis: Multi-armed bandit with availability constraints and sequential change detection and identification. [D] . Yamazaki, Kazutoshi. 2009

机译：关于顺序分析的文章：具有可用性约束以及顺序更改检测和识别的多臂匪。
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. On the identification and mitigation of weaknesses in the Knowledge Gradient policy for multi-armed bandits [O] . Edwards James, Fearnhead Paul, Glazebrook Kevin David 2017

机译：识别和缓解多臂匪徒知识梯度政策中的弱点

ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS

摘要

著录项

相似文献

相关主题

期刊订阅