Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

Richard Combes; Alexandra Proutiere; Alexandra Fauquette

首页> 外文期刊>Performance evaluation review >Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

【24h】

Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

机译：具有连续武器的单峰匪徒：订单 - 无光滑的最佳遗憾

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper bounds on their regret and optimization error. We show that, for a class of reward functions, the SP algorithm achieves a regret and an optimization error with optimal scalings, i.e., O(T~(1/2)) and O(1/T~(1/2)) (up to a logarithmic factor), respectively.

机译：我们考虑随着一组臂的随机强盗问题，并且预期的奖励是手臂的连续和单峰功能。对于这些问题，我们提出了随机多思科（SP）算法，并在其遗憾和优化误差上导出有限时间上限。我们表明，对于一类奖励函数，SP算法实现了遗憾和具有最佳缩放的优化误差，即O（t〜（1/2））和O（1 / t〜（1/2））（最多为一个对数因子）。

著录项

来源
《Performance evaluation review》 |2020年第1期|107-108|共2页
作者
Richard Combes; Alexandra Proutiere; Alexandra Fauquette;
展开▼
作者单位

Centrale-Supelec L2S Gif-sur-Yvette France;

KTH Stockhom Sweden;

KTH Stockhom Sweden;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Coordination without communication: optimal regret in two players multi-armed bandits [J] . Sébastien Bubeck, Thomas Budzinski JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：没有沟通的协调：两个球员多武装匪徒的最佳遗憾
2. Smooth Contextual Bandits: Bridging the Parametric and Non-differentiable Regret Regimes [J] . Yichun Hu, Nathan Kallus, Xiaojie Mao JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：平滑的情境匪徒：弥合参数和非可差异的遗憾制度
3. Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback [J] . Ambuj Tewari, Ankan Saha JMLR: Workshop and Conference Proceedings . 2011,第2011期

机译：改进后悔保证，可通过Bandit反馈进行在线平滑凸优化
4. Combinatorial multi-armed bandit problem with probabilistically triggered arms: A case with bounded regret [C] . A. Ömer Saritaç, Cem Tekin IEEE Global Conference on Signal and Information Processing . 2017

机译：概率触发武器的组合式多臂匪问题：一个有限遗憾的案例
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. Decision-making without a brain: how an amoeboid organism solves the two-armed bandit [O] . Chris R. Reid, Hannelore MacDonald, Richard P. Mann, 2016

机译：无需大脑的决策：变形虫生物如何解决双臂土匪
7. Unimodal Bandits without Smoothness [O] . Combes, Richard, Proutiere, Alexandre 2015

机译：无光滑的单峰匪徒

Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅