...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Best arm identification in multi-armed bandits with delayed feedback
【24h】

Best arm identification in multi-armed bandits with delayed feedback

机译:具有延迟反馈的多臂匪的最佳臂识别

获取原文
           

摘要

In this paper, we propose a generalization of the best arm identification problem in stochastic multi-armed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedbacks. The delay in feedbacks increases the effective sample complexity of the algorithm, but can be offset by partial feedbacks received before a pull is completed. We propose a a general modeling framework to structure in the partial feedbacks, and as a special case we introduce efficient algorithms for best arm identification in settings where the partial feedbacks are biased or unbiased estimators of the final outcome of the pull. Additionally, we propose a novel extension of the algorithms to the parallel MAB setting where an agent can control a batch of arms. Experiments on simulated as well as real world datasets of policy search for charging chemical batteries and hyperparameter optimization for mixed integer programming demonstrate that exploiting the structure of partial and delayed feedbacks can lead to significant improvements over baselines on both sequential and parallel MAB.
机译:在本文中,我们建议将随机多臂匪徒(MAB)中的最佳手臂识别问题推广到将手臂的每次拉动与延迟反馈相关联的设置。反馈中的延迟会增加算法的有效样本复杂度,但可以通过拉动完成之前收到的部分反馈来抵消。我们提出了一个通用的建模框架来构造局部反馈,在特殊情况下,我们引入了有效的算法,以在局部反馈是拉动最终结果的有偏估计或无偏估计的设置中实现最佳手臂识别。此外,我们提出了将算法扩展到并行MAB设置的新颖方法,代理可以控制一批武器。对化学电池充电的策略搜索的模拟和现实世界数据集以及对混合整数编程进行超参数优化的模拟和真实数据集实验均表明,利用部分和延迟反馈的结构可以显着改善顺序和并行MAB的基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号