首页> 外文期刊>Performance evaluation review >Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness
【24h】

Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

机译:具有连续武器的单峰匪徒:订单 - 无光滑的最佳遗憾

获取原文
获取原文并翻译 | 示例

摘要

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper bounds on their regret and optimization error. We show that, for a class of reward functions, the SP algorithm achieves a regret and an optimization error with optimal scalings, i.e., O(T~(1/2)) and O(1/T~(1/2)) (up to a logarithmic factor), respectively.
机译:我们考虑随着一组臂的随机强盗问题,并且预期的奖励是手臂的连续和单峰功能。对于这些问题,我们提出了随机多思科(SP)算法,并在其遗憾和优化误差上导出有限时间上限。我们表明,对于一类奖励函数,SP算法实现了遗憾和具有最佳缩放的优化误差,即O(t〜(1/2))和O(1 / t〜(1/2)) (最多为一个对数因子)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号