首页> 外文期刊>SIAM Journal on Control and Optimization >THE CONTINUUM-ARMED BANDIT PROBLEM
【24h】

THE CONTINUUM-ARMED BANDIT PROBLEM

机译:连续武装匪徒问题

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper we consider the multiarmed bandit problem where the arms are chosen from a subset of the real line and the mean rewards are assumed to be a continuous function of the arms. The problem with an infinite number of arms is much more difficult than the usual one with a finite number of arms because the built-in learning task is now infinite dimensional. We devise a kernel estimator-based learning scheme for the mean reward as a function of the arms. Using this learning scheme, we construct a class of certainty equivalence control with forcing schemes and derive asymptotic upper bounds on their learning loss. To the best of our knowledge, these bounds are the strongest rates yet available. Moreover, they are stronger than the o(n) required for optimality with respect to the average-cost-per-unit-time criterion. [References: 28]
机译:在本文中,我们考虑了多臂强盗问题,其中从实线的子集中选择武器,并且假定平均报酬是武器的连续函数。武器数量无限的问题比武器数量有限的普通难题要困难得多,因为内置的学习任务现在是无限的。我们设计了一种基于核估计器的学习方案,将平均奖励作为手臂的函数。使用这种学习方案,我们用强迫方案构造了一类确定性等价控制,并推导了它们的学习损失的渐近上限。据我们所知,这些界限是迄今为止最强的比率。而且,相对于平均每单位时间成本标准,它们比优化所需的o(n)强。 [参考:28]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号