Bandits with Delayed, Aggregated Anonymous Feedback

Ciara Pike-Burke; Shipra Agrawal; Csaba Szepesvari; Steffen Grunewalder

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Bandits with Delayed, Aggregated Anonymous Feedback

【24h】

Bandits with Delayed, Aggregated Anonymous Feedback

机译：带有延迟，占匿名反馈的匪

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The rewards are stochastically delayed and due to the aggregated nature of the observations, the information of which arm led to a particular reward is lost. The question is what is the cost of the information loss due to this delayed, aggregated anonymous feedback? Previous works have studied bandits with stochastic, non-anonymous delays and found that the regret increases only by an additive factor relating to the expected delay. In this paper, we show that this additive regret increase can be maintained in the harder delayed, aggregated anonymous feedback setting when the expected delay (or a bound on it) is known. We provide an algorithm that matches the worst case regret of the non-anonymous problem exactly when the delays are bounded, and up to logarithmic factors or an additive variance term for unbounded delays.

机译：我们研究了随机美元k $武装问题的变种，我们称之为“带有延迟，聚合的匿名反馈”的匪徒。在这个问题中，当玩家拉动臂时，产生奖励，但没有立即观察到奖励。而是，在每个围绕的每个圆形的末尾只观察到恰好在给定轮换的恰好发生的许多先前产生的奖励的总和。奖励随着观察结果的聚合性质而延迟，其信息武器导致特定的奖励丢失了。这个问题是由于这种延迟的信息损失的成本是什么，聚合的匿名反馈是什么？之前的作品已经研究了随机，非匿名延迟的匪徒，发现遗憾只有一个遗憾增加与预期延迟有关的附加因素。在本文中，我们表明，当预期延迟（或ab）时，可以保持这种添加剂遗憾的遗憾增加，延迟的延迟，聚合的匿名反馈设置。 oder上）是已知的。我们提供了一种算法，该算法与非匿名问题的最坏情况归属于延迟界限，达到对数因子或无限延迟的添加性方差项。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2010期|共9页
作者
Ciara Pike-Burke; Shipra Agrawal; Csaba Szepesvari; Steffen Grunewalder;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Nonstochastic Bandits with Composite Anonymous Feedback [J] . Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour JMLR: Workshop and Conference Proceedings . 2018,第1期

机译：具有复合匿名反馈的非随机强盗
2. Nonstochastic Bandits with Composite Anonymous Feedback [J] . Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：具有复合匿名反馈的非随机强盗
3. Nonstochastic Bandits with Composite Anonymous Feedback [J] . Nicolò Cesa-Bianchi, Claudio Gentile, Yishay Mansour JMLR: Workshop and Conference Proceedings . 2017,第1期

机译：具有复合匿名反馈的非随机强盗
4. Linear Bandits with Stochastic Delayed Feedback [C] . Claire Vernade, Alexandra Carpentier, Tor Lattimore, International Conference on Machine Learning . 2021

机译：具有随机延迟反馈的线性匪徒
5. Contextual Bandits with Delayed Feedback Using Randomized Allocation [D] . ?Arya, Sakshi 2020

机译：使用随机分配具有延迟反馈的上下文匪徒
6. Effect of feedback on delaying deterioration in quality of compressions during 2 minutes of continuous chest compressions: a randomized manikin study investigating performance with and without feedback [O] . Tobias Stenbjerg Lyngeraa, Peter Buhl Hjortrup, Nille Birk Wulff, 2012

机译：反馈对连续2分钟胸部按压产生延迟按压质量恶化的影响：一项随机人体模型研究研究有无反馈情况下的表现
7. Non-Stationary Bandit Strategy for Rate Adaptation With Delayed Feedback [O] . Yapeng Zhao, Hua Qian, Kai Kang, 2020

机译：具有延迟反馈的速率适应的非静止强盗策略

Bandits with Delayed, Aggregated Anonymous Feedback

摘要

著录项

相似文献

相关主题

期刊订阅