首页> 外文期刊>SIAM Journal on Computing >NONSTOCHASTIC MULTI-ARMED BANDITS WITH GRAPH-STRUCTURED FEEDBACK
【24h】

NONSTOCHASTIC MULTI-ARMED BANDITS WITH GRAPH-STRUCTURED FEEDBACK

机译:具有图形结构反馈的非旋转多武装匪

获取原文
获取原文并翻译 | 示例
           

摘要

We introduce and study a partial-information model of online learning, where a decision maker repeatedly chooses from a finite set of actions, and observes some subset of the associated losses. This naturally models several situations where the losses of different actions are related, and knowing the loss of one action provides information on the loss of other actions. Moreover, it generalizes and interpolates between the well studied full-information setting (where all losses are revealed) and the bandit setting (where only the loss of the action chosen by the player is revealed). We provide several algorithms addressing different variants of our setting, and provide tight regret bounds depending on combinatorial properties of the information feedback structure.
机译:我们介绍并研究在线学习的部分信息模型,其中决策者反复从有限一组动作中选择,并观察相关损失的一些子集。 这自然地模拟了不同行动损失相关的几种情况,并且了解一个动作的丢失提供了有关其他行动丢失的信息。 此外,它概括并在研究良好的全信息设置(揭示所有损耗)和强盗设置之间(仅揭示播放器选择的动作的丢失)之间的内插。 我们提供了解决我们设置不同变体的几种算法,并根据信息反馈结构的组合属性提供紧密的遗憾界限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号