Feedback graph regret bounds for Thompson Sampling and UCB

Thodoris Lykouris; éva Tardos; Drishti Wali

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Feedback graph regret bounds for Thompson Sampling and UCB

【24h】

Feedback graph regret bounds for Thompson Sampling and UCB

机译：汤普森采样和UCB的反馈图感到遗憾

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study the stochastic multi-armed bandit problem with the graph-based feedback structure introduced by Mannor and Shamir. We analyze the performance of the two most prominent stochastic bandit algorithms, Thompson Sampling and Upper Confidence Bound (UCB), in the graph-based feedback setting. We show that these algorithms achieve regret guarantees that combine the graph structure and the gaps between the means of the arm distributions. Surprisingly this holds despite the fact that these algorithms do not explicitly use the graph structure to select arms; they observe the additional feedback but do not explore based on it. Towards this result we introduce a layering technique highlighting the commonalities in the two algorithms.

机译：我们使用Mannor和Shamir引入的基于图的反馈结构研究随机多武装匪徒问题。在基于图的反馈设置中，我们分析了两种最突出的随机强盗算法（汤普森采样和上置信界（UCB））的性能。我们证明了这些算法实现了遗憾的保证，该保证结合了图结构和手臂分布的均值之间的间隙。令人惊讶的是，尽管这些算法未明确使用图结构来选择支路，但仍然如此。他们会观察到其他反馈，但不会根据它进行探索。为了达到这个结果，我们引入了一种分层技术，突出了这两种算法的共同点。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2期|共23页
作者
Thodoris Lykouris; éva Tardos; Drishti Wali;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
Stochastic multi-armed banditsfeedback graphsThompson Sampling;

机译：随机多臂土匪反馈图汤普森采样;

相似文献

外文文献
中文文献
专利

1. Self-accelerated Thompson sampling with near-optimal regret upper bound [J] . Zhu Zhenyu, Huang Liusheng, Xu Hongli Neurocomputing . 2020,第Jul25期

机译：自加速汤普森采样，近乎最佳遗憾的上限
2. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems [J] . Marc Abeille, Alessandro Lazaric JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：线性二次控制问题中汤普森采样的改进后悔边界
3. Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems [J] . Marc Abeille, Alessandro Lazaric JMLR: Workshop and Conference Proceedings . 2018,第12期

机译：线性二次控制问题中汤普森采样的改进后悔边界
4. An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting [C] . Cem Kalkanlı, Ayfer Özgür IEEE International Symposium on Information Theory . 2020

机译：高斯线性强盗设置中汤普森采样的改进后悔界。
5. Use of monoclonal Delta-9-tetrahydrocannabinol antibodies chemically bound to a polystyrene surface using glutaraldehyde for the purpose of extracting Delta-9-tetrahydrocannabinol and Delta-9-tetrahydrocannabinol carboxylic acid from postmortem whole blood samples for analysis by gas chromatography mass spectrometry. [D] . Pittman, Thomas Sidney. 2008

机译：使用戊二醛化学键合到聚苯乙烯表面的单克隆Delta-9-四氢大麻酚抗体从死后全血样本中提取Delta-9-四氢大麻酚和Delta-9-四氢大麻酚羧酸的目的，以通过气相色谱质谱法进行分析。
6. Efficiently sampling the realizations of bounded, irregular degree sequences of bipartite and directed graphs [O] . Péter L. Erdős, Tamás Róbert Mezei, István Miklós, 2012

机译：有效地采样二部图和有向图的有界，不规则度序列的实现
7. 25th Annual Conference on Learning Theory Open Problem: Regret Bounds for Thompson Sampling [O] . Lihong Li, Olivier Chapelle Criteo, Shie Mannor, 2014

机译：第25届学习理论开放性问题年会：汤普森抽样的遗憾界限
8. Quantum Tomography via Compressed Sensing: Error Bounds, Sample Complexity and Efficient Estimators [R] . Flammia, ST, Gross, D, Liu, Y, 2012

机译：通过压缩感知进行量子断层扫描：误差界限，样本复杂度和有效估计

Feedback graph regret bounds for Thompson Sampling and UCB

摘要

著录项

相似文献

相关主题

期刊订阅