...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Regret Analysis of Bandit Problems with Causal Background Knowledge
【24h】

Regret Analysis of Bandit Problems with Causal Background Knowledge

机译:因果背景知识的强盗问题遗憾分析

获取原文
           

摘要

We study how to learn optimal interventions sequentially given causal information represented as a causal graph along with associated conditional distributions. Causal modeling is useful in real world problems like online advertisement where complex causal mechanisms underlie the relationship between interventions and outcomes. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. We thus resolve an open problem posed by Lattimore et al. (2016). Further, we extend C-UCB and C-TS to the linear bandit setting and propose causal linear UCB (CL-UCB) and causal linear TS (CL-TS) algorithms. These algorithms enjoy a cumulative regret bound that only scales with the feature dimension. Our experiments show the benefit of using causal information. For example, we observe that even with a few hundreds of iterations, the regret of causal algorithms is less than that of standard algorithms by a factor of three. We also show that under certain causal structures, our algorithms scale better than the standard bandit algorithms as the number of interventions increases.
机译:我们研究了如何顺序地学习最佳干预,以便将表示作为因果图的因果信息以及相关的条件分布。因果建模在在线广告等现实世界问题中有用,其中复杂的因果机制利于干预措施与结果之间的关系。我们提出了两种算法,因果上置信度绑定(C-UCB)和因果汤普森采样(C-TS),与不使用因果信息的算法相比,享有改进的累积遗憾界限。因此,我们解决了Lattimore等人提出的打开问题。 (2016)。此外,我们将C-UCB和C-TS扩展到线性强盗设置,并提出因果线性UCB(CL-UCB)和因果线性TS(CL-TS)算法。这些算法享有累积遗憾,只能使用特征尺寸缩放。我们的实验表明使用因果关系的好处。例如,我们观察到即使有几百次迭代,因果算法的遗憾小于标准算法的遗憾程度三倍。我们还表明,在某些因果结构下,由于干预次数增加,我们的算法比标准强盗算法更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号