首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Bandits with Delayed, Aggregated Anonymous Feedback
【24h】

Bandits with Delayed, Aggregated Anonymous Feedback

机译:带有延迟,占匿名反馈的匪

获取原文
           

摘要

We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The rewards are stochastically delayed and due to the aggregated nature of the observations, the information of which arm led to a particular reward is lost. The question is what is the cost of the information loss due to this delayed, aggregated anonymous feedback? Previous works have studied bandits with stochastic, non-anonymous delays and found that the regret increases only by an additive factor relating to the expected delay. In this paper, we show that this additive regret increase can be maintained in the harder delayed, aggregated anonymous feedback setting when the expected delay (or a bound on it) is known. We provide an algorithm that matches the worst case regret of the non-anonymous problem exactly when the delays are bounded, and up to logarithmic factors or an additive variance term for unbounded delays.
机译:我们研究了随机美元k $武装问题的变种,我们称之为“带有延迟,聚合的匿名反馈”的匪徒。在这个问题中,当玩家拉动臂时,产生奖励,但没有立即观察到奖励。而是,在每个围绕的每个圆形的末尾只观察到恰好在给定轮换的恰好发生的许多先前产生的奖励的总和。奖励随着观察结果的聚合性质而延迟,其信息武器导致特定的奖励丢失了。这个问题是由于这种延迟的信息损失的成本是什么,聚合的匿名反馈是什么?之前的作品已经研究了随机,非匿名延迟的匪徒,发现遗憾只有一个遗憾增加与预期延迟有关的附加因素。在本文中,我们表明,当预期延迟(或ab)时,可以保持这种添加剂遗憾的遗憾增加,延迟的延迟,聚合的匿名反馈设置。 oder上)是已知的。我们提供了一种算法,该算法与非匿名问题的最坏情况归属于延迟界限,达到对数因子或无限延迟的添加性方差项。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号