首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Top-$k$ Combinatorial Bandits with Full-Bandit Feedback
【24h】

Top-$k$ Combinatorial Bandits with Full-Bandit Feedback

机译:顶级$ k $组合式强盗,带有全强盗反馈

获取原文
       

摘要

Top-$k$ Combinatorial Bandits generalize multi-armed bandits, where at each round any subset of $k$ out of $n$ arms may be chosen and the sum of the rewards is gained. We address the full-bandit feedback, in which the agent observes only the sum of rewards, in contrast to the semi-bandit feedback, in which the agent observes also the individual arms’ rewards. We present the Combinatorial Successive Accepts and Rejects (CSAR) algorithm, which generalizes SAR (Bubeck et al., 2013) for top-k combinatorial bandits. Our main contribution is an efficient sampling scheme that uses Hadamard matrices in order to estimate accurately the individual arms’ expected rewards. We discuss two variants of the algorithm, the first minimizes the sample complexity and the second minimizes the regret. We also prove a lower bound on sample complexity, which is tight for $k=O(1)$. Finally, we run experiments and show that our algorithm outperforms other methods.
机译:顶级$ k $组合匪徒概括了多臂匪,在每轮中,可以选择$ n $军中$ k $的任何子集,并获得奖励的总和。我们针对的是全强反馈,在这种半主动反馈中,特工仅观察到奖励的总和,而在半强反馈中,代理商也观察单个武器的奖励。我们提出了组合连续接受和拒绝(CSAR)算法,该算法概括了top-k组合强盗的SAR(Bubeck等,2013)。我们的主要贡献是使用Hadamard矩阵的有效采样方案,以便准确估算单个武器的预期收益。我们讨论了该算法的两个变体,第一个使样本复杂度最小化,第二个使后悔最小化。我们还证明了样本复杂度的下限,这对于$ k = O(1)$来说是紧密的。最后,我们进行实验,证明我们的算法优于其他方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号