Variational inference for the multi-armed contextual bandit

I?igo Urteaga; Chris Wiggins

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Variational inference for the multi-armed contextual bandit

【24h】

Variational inference for the multi-armed contextual bandit

机译：多臂上下文强盗的变分推理

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In many biomedical, science, and engineering problems, one must sequentially decide which action to take next so as to maximize rewards. One general class of algorithms for optimizing interactions with the world, while simultaneously learning how the world operates, is the multi-armed bandit setting and, in particular, the contextual bandit case. In this setting, for each executed action, one observes rewards that are dependent on a given ’context’, available at each interaction with the world. The Thompson sampling algorithm has recently been shown to enjoy provable optimality properties for this set of problems, and to perform well in real-world settings. It facilitates generative and interpretable modeling of the problem at hand. Nevertheless, the design and complexity of the model limit its application, since one must both sample from the distributions modeled and calculate their expected rewards. We here show how these limitations can be overcome using variational inference to approximate complex models, applying to the reinforcement learning case advances developed for the inference case in the machine learning community over the past two decades. We consider contextual multi-armed bandit applications where the true reward distribution is unknown and complex, which we approximate with a mixture model whose parameters are inferred via variational inference. We show how the proposed variational Thompson sampling approach is accurate in approximating the true distribution, and attains reduced regrets even with complex reward distributions. The proposed algorithm is valuable for practical scenarios where restrictive modeling assumptions are undesirable.

机译：在许多生物医学，科学和工程学问题中，必须依次决定下一步应该采取的行动，以使报酬最大化。一类用于优化与世界互动的算法，同时学习世界的运行方式的多类算法是多臂匪徒设置，尤其是上下文匪徒情况。在这种设置下，对于每个执行的动作，都会观察到与给定“上下文”相关的奖励，这种奖励可在与世界的每次互动中获得。汤普森采样算法最近被证明具有针对这组问题的可证明的最优性，并且在实际环境中表现良好。它有助于对当前问题进行生成和可解释的建模。但是，模型的设计和复杂性限制了它的应用，因为必须从建模的分布中采样并计算其预期收益。我们在这里展示了如何使用变分推理来近似复杂模型来克服这些限制，并将其应用于过去二十年来针对机器学习社区中的推理案例开发的强化学习案例。我们考虑真实奖励分配未知且复杂的上下文多臂强盗应用程序，我们使用混合模型进行近似，该模型的参数是通过变分推断来推断的。我们展示了拟议的变分汤普森采样方法在逼近真实分布时如何准确，即使复杂的奖励分布也能减少后悔。所提出的算法对于不希望使用限制性建模假设的实际情况非常有价值。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第3期|共9页
作者
I?igo Urteaga; Chris Wiggins;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Online Residential Demand Response via Contextual Multi-Armed Bandits [J] . Chen Xin, Nie Yutong, Li Na IEEE Control Systems Letters . 2021,第2期

机译：通过上下文多武装匪徒在线住宅需求响应
2. Selecting multiple web adverts: A contextual multi-armed bandit with state uncertainty [J] . James A. Edwards, David S. Leslie Journal of the Operational Research Society . 2020,第1期

机译：选择多个网络广告：状态不确定的上下文多臂匪
3. Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards [J] . Arya Sakshi, Yang Yuhong Statistics & Probability Letters . 2020,第1期

机译：随机分配与延迟奖励的上下文多武装匪徒的非参数分配
4. Security Assessment of the Contextual Multi-Armed Bandit - RL Algorithm for Link Adaptation [C] . Mariam El-Sobky, Hisham Sarhan, Mervat Abu-ElKheir Novel Intelligent and Leading Emerging Sciences Conference . 2020

机译：关联自适应多上下文强盗的安全评估-RL算法
5. Offline Evaluation of Multi-Armed Bandit Algorithms Using Bootstrapped Replay on Expanded Data [D] . Dai, Jin. 2021

机译：在扩展数据上使用引导重播的多武装强盗算法的离线评估
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. Local Clustering in Contextual Multi-Armed Bandits [O] . Yikun Ban, Jingrui He 2021

机译：中文多武装匪徒的本地聚类

Variational inference for the multi-armed contextual bandit

摘要

著录项

相似文献

相关主题

期刊订阅