Regret Analysis of Bandit Problems with Causal Background Knowledge

Yangyi Lu; Amirhossein Meisami; Ambuj Tewari; William Yan

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Regret Analysis of Bandit Problems with Causal Background Knowledge

【24h】

Regret Analysis of Bandit Problems with Causal Background Knowledge

机译：因果背景知识的强盗问题遗憾分析

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study how to learn optimal interventions sequentially given causal information represented as a causal graph along with associated conditional distributions. Causal modeling is useful in real world problems like online advertisement where complex causal mechanisms underlie the relationship between interventions and outcomes. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. We thus resolve an open problem posed by Lattimore et al. (2016). Further, we extend C-UCB and C-TS to the linear bandit setting and propose causal linear UCB (CL-UCB) and causal linear TS (CL-TS) algorithms. These algorithms enjoy a cumulative regret bound that only scales with the feature dimension. Our experiments show the benefit of using causal information. For example, we observe that even with a few hundreds of iterations, the regret of causal algorithms is less than that of standard algorithms by a factor of three. We also show that under certain causal structures, our algorithms scale better than the standard bandit algorithms as the number of interventions increases.

机译：我们研究了如何顺序地学习最佳干预，以便将表示作为因果图的因果信息以及相关的条件分布。因果建模在在线广告等现实世界问题中有用，其中复杂的因果机制利于干预措施与结果之间的关系。我们提出了两种算法，因果上置信度绑定（C-UCB）和因果汤普森采样（C-TS），与不使用因果信息的算法相比，享有改进的累积遗憾界限。因此，我们解决了Lattimore等人提出的打开问题。（2016）。此外，我们将C-UCB和C-TS扩展到线性强盗设置，并提出因果线性UCB（CL-UCB）和因果线性TS（CL-TS）算法。这些算法享有累积遗憾，只能使用特征尺寸缩放。我们的实验表明使用因果关系的好处。例如，我们观察到即使有几百次迭代，因果算法的遗憾小于标准算法的遗憾程度三倍。我们还表明，在某些因果结构下，由于干预次数增加，我们的算法比标准强盗算法更好。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共10页
作者
Yangyi Lu; Amirhossein Meisami; Ambuj Tewari; William Yan;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [J] . Sebastien Bubeck, Nicolo Cesa-Bianchi Foundations and trends in machine learning . 2012,第1期

机译：随机和非随机多臂匪问题的遗憾分析
2. Why do people misunderstand stroke symptoms? How background knowledge affects causal attributions for ambiguous stroke symptoms [J] . Gallagher Jake, McClure John, McDowall John Brain injury: BI . 2019,第8期

机译：为什么人们误解脑卒中症状？背景知识如何影响含糊不清中风症状的因果关系
3. SEQUENTIAL AND PARALLEL ALGORITHMS FOR CAUSAL EXPLANATION WITH BACKGROUND KNOWLEDGE [J] . BHASKARA REDDY MOOLE, MARCO VALTORTA International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems . 2004,第OctS期

机译：具有背景知识的因果解释的顺序和并行算法
4. Causal inference and causal explanation with background knowledge [C] . Christopher Meek Uncertainty in artificial intelligence . 1995

机译：具有背景知识的因果推论和因果解释
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. Correlation set analysis: detecting active regulators in disease populations using prior causal knowledge [O] . Chia-Ling Huang, John Lamb, Leonid Chindelevitch, 2012

机译：相关集分析：使用先前的因果知识来检测疾病人群中的活性调节剂
7. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems [O] . 2016

机译：随机和非随机多臂强盗问题的后悔分析
8. Youth Knowledge Development Report, Research on Youth Employment and Employability Development. Youth Employment Policies and Programs for the 1980s: Background Analysis for the Employment Training Components of the Youth Act of 1980 [R] . Taggart, R. 1980

机译：青年知识发展报告，青年就业和就业能力发展研究。 1980年代青年就业政策和方案：1980年“青年法”就业培训部分的背景分析

Regret Analysis of Bandit Problems with Causal Background Knowledge

摘要

著录项

相似文献

相关主题

期刊订阅