Best arm identification in multi-armed bandits with delayed feedback

Aditya Grover; Todor Markov; Peter Attia; Norman Jin; Nicolas Perkins; Bryan Cheong; Michael Chen; Zi Yang; Stephen Harris; William Chueh; Stefano Ermon

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Best arm identification in multi-armed bandits with delayed feedback

【24h】

Best arm identification in multi-armed bandits with delayed feedback

机译：具有延迟反馈的多臂匪的最佳臂识别

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedbacks. The delay in feedbacks increases the effective sample complexity of the algorithm, but can be offset by partial feedbacks received before a pull is completed. We propose a a general modeling framework to structure in the partial feedbacks, and as a special case we introduce efficient algorithms for best arm identification in settings where the partial feedbacks are biased or unbiased estimators of the final outcome of the pull. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Experiments on simulated as well as real world datasets of policy search for charging chemical batteries and hyperparameter optimization for mixed integer programming demonstrate that exploiting the structure of partial and delayed feedbacks can lead to significant improvements over baselines on both sequential and parallel MAB.

机译：在本文中，我们建议将随机多臂匪徒（MAB）中的最佳手臂识别问题推广到将手臂的每次拉动与延迟反馈相关联的设置。反馈中的延迟会增加算法的有效样本复杂度，但可以通过拉动完成之前收到的部分反馈来抵消。我们提出了一个通用的建模框架来构造局部反馈，在特殊情况下，我们引入了有效的算法，以在局部反馈是拉动最终结果的有偏估计或无偏估计的设置中实现最佳手臂识别。此外，我们提出了将算法扩展到并行MAB设置的新颖方法，代理可以控制一批武器。对化学电池充电的策略搜索的模拟和现实世界数据集以及对混合整数编程进行超参数优化的模拟和真实数据集实验均表明，利用部分和延迟反馈的结构可以显着改善顺序和并行MAB的基线。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第3期|共10页
作者
Aditya Grover; Todor Markov; Peter Attia; Norman Jin; Nicolas Perkins; Bryan Cheong; Michael Chen; Zi Yang; Stephen Harris; William Chueh; Stefano Ermon;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Randomized allocation with nonparametric estimation for contextual multi-armed bandits with delayed rewards [J] . Arya Sakshi, Yang Yuhong Statistics & Probability Letters . 2020,第1期

机译：随机分配与延迟奖励的上下文多武装匪徒的非参数分配
2. A numerical analysis of allocation strategies for the multi-armed bandit problem under delayed rewards conditions in digital campaign management [J] . Martin Miguel, Jimenez-Martin Antonio, Mateos Alfonso Neurocomputing . 2019,第Octa21期

机译：数字战役管理中延迟奖励条件下多臂匪问题分配策略的数值分析
3. Priority index heuristic for multi-armed bandit problems with set-up costs and/or set-up time delays [J] . F. DUSONCHET, M.-O. HONGLER International Journal of Computer Integrated Manufacturing . 2006,第3期

机译：具有设置成本和/或设置时间延迟的多臂匪问题的优先级指标启发式
4. PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits [C] . Arghya Roy Chaudhuri, Shivaram Kalyanakrishnan International Conference on Machine Learning . 2019

机译：PAC识别随机多武装匪徒的许多良好臂
5. Essays on sequential analysis: Multi-armed bandit with availability constraints and sequential change detection and identification. [D] . Yamazaki, Kazutoshi. 2009

机译：关于顺序分析的文章：具有可用性约束以及顺序更改检测和识别的多臂匪。
6. Smoking and the bandit: A preliminary study of smoker and non-smoker differences in exploratory behavior measured with a multi-armed bandit task [O] . Merideth A. Addicott, John M. Pearson, Jessica Wilson, -1

机译：吸烟和强盗：用多武装强盗任务测量的探索性行为的吸烟者和非吸烟者差异的初步研究
7. The Multi-Armed Bandit Problem under Delayed Rewards Conditions in Digital Campaign Management [O] . M. Martin, A. Jimenez-Martin, A. Mateos 2019

机译：数字竞选管理中延迟奖励条件下的多武装强盗问题

Best arm identification in multi-armed bandits with delayed feedback

摘要

著录项

相似文献

相关主题

期刊订阅