Top-$k$ Combinatorial Bandits with Full-Bandit Feedback

Idan Rejwan; Yishay Mansour

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Top-$k$ Combinatorial Bandits with Full-Bandit Feedback

【24h】

Top-$k$ Combinatorial Bandits with Full-Bandit Feedback

机译：顶级$ k $组合式强盗，带有全强盗反馈

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Top-$k$ Combinatorial Bandits generalize multi-armed bandits, where at each round any subset of $k$ out of $n$ arms may be chosen and the sum of the rewards is gained. We address the full-bandit feedback, in which the agent observes only the sum of rewards, in contrast to the semi-bandit feedback, in which the agent observes also the individual arms’ rewards. We present the Combinatorial Successive Accepts and Rejects (CSAR) algorithm, which generalizes SAR (Bubeck et al., 2013) for top-k combinatorial bandits. Our main contribution is an efficient sampling scheme that uses Hadamard matrices in order to estimate accurately the individual arms’ expected rewards. We discuss two variants of the algorithm, the first minimizes the sample complexity and the second minimizes the regret. We also prove a lower bound on sample complexity, which is tight for $k=O(1)$. Finally, we run experiments and show that our algorithm outperforms other methods.

机译：顶级$ k $组合匪徒概括了多臂匪，在每轮中，可以选择$ n $军中$ k $的任何子集，并获得奖励的总和。我们针对的是全强反馈，在这种半主动反馈中，特工仅观察到奖励的总和，而在半强反馈中，代理商也观察单个武器的奖励。我们提出了组合连续接受和拒绝（CSAR）算法，该算法概括了top-k组合强盗的SAR（Bubeck等，2013）。我们的主要贡献是使用Hadamard矩阵的有效采样方案，以便准确估算单个武器的预期收益。我们讨论了该算法的两个变体，第一个使样本复杂度最小化，第二个使后悔最小化。我们还证明了样本复杂度的下限，这对于$ k = O（1）$来说是紧密的。最后，我们进行实验，证明我们的算法优于其他方法。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第4期|共25页
作者
Idan Rejwan; Yishay Mansour;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
Multi-Armed BanditsCombinatorial BanditsTop-k BanditsHadamard MatrixSam?ple ComplexityRegret MinimizationExperimental Design;

机译：多武装土匪组合土匪Top-k土匪哈达玛矩阵样本复杂性后悔最小化实验设计;

相似文献

外文文献
中文文献
专利

1. Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback [J] . Kuroki Yuko, Xu Liyuan, Miyauchi Atsushi, Neural computation . 2020,第9期

机译：具有全强盗反馈的多臂识别多项式算法
2. Local Weighted Matrix Factorization for Top-n Recommendation with Implicit Feedback [J] . Keqiang Wang, Hongwei Peng, Yuanyuan Jin, Data Science and Engineering . 2016,第4期

机译：具有隐式反馈的Top- n 建议的局部加权矩阵分解
3. Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits [J] . Thibaut Cuvelier, Richard Combes, Eric Gourdin Performance evaluation review . 2021,第1期

机译：组合半刺槐的统计有效，多项式时间算法
4. Combinatorial Bandits with Relative Feedback [C] . Aadirupa Saha, Aditya Gopalan Conference on Neural Information Processing Systems . 2020

机译：具有相对反馈的组合匪徒
5. Adaptive Preference Learning with Bandit Feedback: Information Filtering, Dueling Bandits and Incentivizing Exploration [D] . Chen, Bangrui. 2017

机译：带有土匪反馈的自适应偏好学习：信息过滤，决斗土匪和激励探索
6. PNAS Plus: Output-driven feedback system control platform optimizes combinatorial therapy of tuberculosis using a macrophage cell culture model [O] . Aleidy Silva, Bai-Yu Lee, Daniel L. Clemens, 2016

机译：PNAS Plus：输出驱动的反馈系统控制平台使用巨噬细胞细胞培养模型优化结核病的联合治疗
7. Polynomial-Time Algorithms for Multiple-Arm Identification with Full-Bandit Feedback [O] . Yuko Kuroki, Liyuan Xu, Atsushi Miyauchi, 2020

机译：具有全强盗反馈的多臂识别多项式算法

Top-$k$ Combinatorial Bandits with Full-Bandit Feedback

摘要

著录项

相似文献

相关主题

期刊订阅