...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms
【24h】

Analysis of Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms

机译:概率触发臂的组合式多臂匪的汤普森采样分析

获取原文
           

摘要

We analyze the regret of combinatorial Thompson sampling (CTS) for the combinatorial multi-armed bandit with probabilistically triggered arms under the semi-bandit feedback setting. We assume that the learner has access to an exact optimization oracle but does not know the expected base arm outcomes beforehand. When the expected reward function is Lipschitz continuous in the expected base arm outcomes, we derive $O(sum_{i =1}^m log T / (p_i Delta_i))$ regret bound for CTS, where $m$ denotes the number of base arms, $p_i$ denotes the minimum non-zero triggering probability of base arm $i$ and $Delta_i$ denotes the minimum suboptimality gap of base arm $i$. We also compare CTS with combinatorial upper confidence bound (CUCB) via numerical experiments on a cascading bandit problem.
机译:我们分析了在半强反馈设置下,概率触发的武器的组合多臂匪的组合汤普森抽样(CTS)的遗憾。我们假设学习者可以使用精确的优化预言,但事先不知道预期的基础结果。当预期奖励函数在预期基准臂结果中为Lipschitz连续时,我们得出CTS的$ O( sum_ {i = 1} ^ m log T /(p_i Delta_i))$后悔,其中$ m $表示基本臂的数量,$ p_i $表示基本臂$ i $的最小非零触发概率,而$ Delta_i $表示基本臂$ i $的最小次优差距。我们还通过级联强盗问题的数值实验,将CTS与组合上限置信区间(CUCB)进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号