首页> 外文会议>Asilomar Conference on Signals, Systems Computers >Time-varying stochastic multi-armed bandit problems
【24h】

Time-varying stochastic multi-armed bandit problems

机译:时变随机多臂匪问题

获取原文

摘要

In this paper, we consider a time-varying stochastic multi-armed bandit (MAB) problem where the unknown reward distribution of each arm can change arbitrarily over time. We obtain a lower bound on the regret order and demonstrate that an online learning algorithm achieves this lower bound. We further consider a piece-wise stationary model of the arm reward distributions and establish the regret performance of an online learning algorithm in terms of the number of change points experienced by the reward distributions over the time horizon.
机译:在本文中,我们考虑了随时间变化的随机多臂土匪(MAB)问题,其中各臂的未知奖励分配可以随时间任意变化。我们获得后悔顺序的下限,并证明在线学习算法可以达到此下限。我们进一步考虑手臂奖励分布的分段固定模型,并根据时间范围内奖励分布所经历的变化点的数量来建立在​​线学习算法的遗憾性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号