...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without
【24h】

Non-Stochastic Multi-Player Multi-Armed Bandits: Optimal Rate With Collision Information, Sublinear Without

机译:非随机多人多武装匪:碰撞信息的最佳速率,载于载重率

获取原文
           

摘要

We consider the non-stochastic version of the (cooperative) multi-player multi-armed bandit problem. The model assumes no communication and no shared randomness at all between the players, and furthermore when two (or more) players select the same action this results in a maximal loss. We prove the first $sqrt{T}$-type regret guarantee for this problem, assuming only two players, and under the feedback model where collisions are announced to the colliding players. We also prove the first sublinear regret guarantee for the feedback model where collision information is not available, namely $T^{1-rac{1}{2m}}$ where $m$ is the number of players.
机译:我们考虑(合作)多武器多武装强盗问题的非随机版本。该模型在玩家之间没有任何通信,并且在玩家之间没有共享随机性,而且当两个(或更多)玩家选择相同的动作时,这导致最大损耗。我们证明了第一个$ sqrt {t} $ - yeione遗憾的保证这个问题,假设只有两个玩家,并且在碰撞碰撞玩家的反馈模型下。我们还证明了第一个Sublinear遗憾保证了碰撞信息不可用的反馈模型,即$ t ^ {1- frac {1} {2m}} $ where $ m $是玩家的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号