首页> 外文期刊>JMLR: Workshop and Conference Proceedings >A Closer Look at Small-loss Bounds for Bandits with Graph Feedback
【24h】

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

机译:仔细看看具有图形反馈的匪徒的小损失范围

获取原文
           

摘要

We study {it small-loss} bounds for adversarial multi-armed bandits with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds. We derive the first small-loss bound for general strongly observable graphs, resolving an open problem of Lykouris et al. (2018). Specifically, we develop an algorithm with regret $mathcal{ilde{O}}(sqrt{kappa L_*})$ where $kappa$ is the clique partition number and $L_*$ is the loss of the best arm, and for the special case of self-aware graphs where every arm has a self-loop, we improve the regret to $mathcal{ilde{O}}(min{sqrt{lpha T}, sqrt{kappa L_*}})$ where $lpha leq kappa$ is the independence number. Our results significantly improve and extend those by Lykouris et al. (2018) who only consider self-aware undirected graphs. Furthermore, we also take the first attempt at deriving small-loss bounds for weakly observable graphs. We first prove that no typical small-loss bounds are achievable in this case, and then propose algorithms with alternative small-loss bounds in terms of the loss of some specific subset of arms. A surprising side result is that $mathcal{ilde{O}}(sqrt{T})$ regret is achievable even for weakly observable graphs as long as the best arm has a self-loop. Our algorithms are based on the Online Mirror Descent framework but require a suite of novel techniques that might be of independent interest. Moreover, all our algorithms can be made parameter-free without the knowledge of the environment.
机译:我们研究{ IT小损失}对抗反馈的对抗多武装匪徒的界限,即适应性后悔界限,取决于最佳臂或相关数量的丢失,而不是圆数的总轮数。我们派生了一般强烈可观察的图表的第一个小损失,解决了Lykouris等人的公开问题。 (2018)。具体而言,我们用遗憾的$ mathcal { tilde {o}}( sqrt { kappa l _ *})开发了算法}( sqrt { kappa l _ *})$ where $ kappa $是clique分区号码和$ l _ * $是最好的手臂,以及每个臂的特殊情况,每个臂都有一个自我回路,我们改善了$ mathcal { tilde {o}}( min { sqrt { alpha t}, sqrt { kappa l _ *} })$ where $ alpha leq kappa $是独立号码。我们的结果显着改善并扩大了Lykouris等人。 (2018)只考虑自我意识的无向图。此外,我们还将第一次尝试推导出弱观察图的小损失范围。我们首先证明在这种情况下,没有典型的小损失界限是可以实现的,然后在丢失某些特定臂的臂的损失方面提出替代小损失界限的算法。令人惊讶的副作用是$ mathcal { tilde {o}}( sqrt {t})$后悔即使对于弱观察图,只要最好的手臂具有自循环即可。我们的算法基于在线镜像血迹框架,但需要一套可能具有独立兴趣的新技术套件。此外,我们所有的算法都可以在没有环境知识的情况下无参数进行参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号