A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

Chung-Wei Lee; Haipeng Luo; Mengxiao Zhang

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

【24h】

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

机译：仔细看看具有图形反馈的匪徒的小损失范围

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study {it small-loss} bounds for adversarial multi-armed bandits with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds. We derive the first small-loss bound for general strongly observable graphs, resolving an open problem of Lykouris et al. (2018). Specifically, we develop an algorithm with regret $mathcal{ilde{O}}(sqrt{kappa L_*})$ where $kappa$ is the clique partition number and $L_*$ is the loss of the best arm, and for the special case of self-aware graphs where every arm has a self-loop, we improve the regret to $mathcal{ilde{O}}(min{sqrt{lpha T}, sqrt{kappa L_*}})$ where $lpha leq kappa$ is the independence number. Our results significantly improve and extend those by Lykouris et al. (2018) who only consider self-aware undirected graphs. Furthermore, we also take the first attempt at deriving small-loss bounds for weakly observable graphs. We first prove that no typical small-loss bounds are achievable in this case, and then propose algorithms with alternative small-loss bounds in terms of the loss of some specific subset of arms. A surprising side result is that $mathcal{ilde{O}}(sqrt{T})$ regret is achievable even for weakly observable graphs as long as the best arm has a self-loop. Our algorithms are based on the Online Mirror Descent framework but require a suite of novel techniques that might be of independent interest. Moreover, all our algorithms can be made parameter-free without the knowledge of the environment.

机译：我们研究{ IT小损失}对抗反馈的对抗多武装匪徒的界限，即适应性后悔界限，取决于最佳臂或相关数量的丢失，而不是圆数的总轮数。我们派生了一般强烈可观察的图表的第一个小损失，解决了Lykouris等人的公开问题。（2018）。具体而言，我们用遗憾的$ mathcal { tilde {o}}（ sqrt { kappa l _ *}）开发了算法}（ sqrt { kappa l _ *}）$ where $ kappa $是clique分区号码和$ l _ * $是最好的手臂，以及每个臂的特殊情况，每个臂都有一个自我回路，我们改善了$ mathcal { tilde {o}}（ min { sqrt { alpha t}， sqrt { kappa l _ *} }）$ where $ alpha leq kappa $是独立号码。我们的结果显着改善并扩大了Lykouris等人。（2018）只考虑自我意识的无向图。此外，我们还将第一次尝试推导出弱观察图的小损失范围。我们首先证明在这种情况下，没有典型的小损失界限是可以实现的，然后在丢失某些特定臂的臂的损失方面提出替代小损失界限的算法。令人惊讶的副作用是$ mathcal { tilde {o}}（ sqrt {t}）$后悔即使对于弱观察图，只要最好的手臂具有自循环即可。我们的算法基于在线镜像血迹框架，但需要一套可能具有独立兴趣的新技术套件。此外，我们所有的算法都可以在没有环境知识的情况下无参数进行参数。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共49页
作者
Chung-Wei Lee; Haipeng Luo; Mengxiao Zhang;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
multi-armed banditsfeedback graphsmall-loss bounds.;

机译：多武装匪徒留下GraphSmall丢失界限。;

相似文献

外文文献
中文文献
专利

1. New bounds on the price of bandit feedback for mistake-bounded online multiclass learning [J] . Long Philip M. Theoretical computer science . 2020,第期

机译：用于错误的在线多种单位学习的强盗反馈价格的新界限
2. New bounds on the price of bandit feedback for mistake-bounded online multiclass learning [J] . Philip M. Long JMLR: Workshop and Conference Proceedings . 2017,第2010期

机译：用于错误的在线多种单位学习的强盗反馈价格的新界限
3. NONSTOCHASTIC MULTI-ARMED BANDITS WITH GRAPH-STRUCTURED FEEDBACK [J] . Alon Noga, Cesa-Bianchi Nicolo, Gentile Claudio, SIAM Journal on Computing . 2017,第6期

机译：具有图形结构反馈的非旋转多武装匪
4. Bandits with Feedback Graphs and Switching Costs [C] . Raman Arora, Teodor V. Marinov, Mehryar Mohri Conference on Neural Information Processing Systems . 2020

机译：具有反馈图和交换成本的匪徒
5. Closing the Loop: Holographic Feedback for Soft-Matter Processes [D] . Hannel, Mark D. 2018

机译：关闭循环：柔软物质过程的全息反馈
6. Using Task Clarification Graphic Feedback And Verbal Feedback To Increase Closing-Task Completion In A Privately Owned Restaurant [O] . John Austin, Nic L Weatherly, Nicole E Gravina 2005

机译：使用任务澄清图形反馈和口头反馈来增加私有餐厅的结业任务完成率
7. A note on the price of bandit feedback for mistake-bounded online learning [O] . Jesse Geneson 2021

机译：关于误区在线学习的强盗反馈价格的说明

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

摘要

著录项

相似文献

相关主题

期刊订阅