Randomized Exploration for Non-Stationary Stochastic Linear Bandits

Baekjin Kim; Ambuj Tewari

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Randomized Exploration for Non-Stationary Stochastic Linear Bandits

【24h】

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

机译：非静止随机线性匪徒随机探索

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate two perturbation approaches to overcome conservatism that optimism based algorithms chronically suffer from in practice. The first approach replaces optimism with a simple randomization when using confidence sets. The second one adds random perturbations to its current estimate before maximizing the expected reward. For non-stationary linear bandits, where each action is associated with a $d$-dimensional feature and the unknown parameter is time-varying with total variation $B_T$, we propose two randomized algorithms, Discounted Randomized LinUCB (D-RandLinUCB) and Discounted Linear Thompson Sampling (D-LinTS) via the two perturbation approaches. We highlight the statistical optimality versus computational efficiency trade-off between them in that the former asymptotically achieves the optimal dynamic regret $ilde{O}(d ^{2/3}B_T^{1/3} T^{2/3})$, but the latter is oracle-efficient with an extra logarithmic factor in the number of arms compared to minimax-optimal dynamic regret. In a simulation study, both algorithms show the outstanding performance in tackling conservatism issue that Discounted LinUCB struggles with.

机译：我们调查了两种扰动方法，以克服保守主义，即基于乐观的算法在实践中营养。第一个方法在使用信心集时用简单的随机化取代乐观主义。第二个在最大化预期奖励之前，将随机扰动增加到其当前估计。对于非静止线性匪徒，其中每个动作与$ d $ -dimensional特征相关联，未知参数与全文总变化的时间变化$ b_t $，我们提出了两个随机算法，折扣随机的linucb（d-randlinucb）和通过两种扰动方法折扣线性汤普森采样（D-Lints）。我们突出了统计最优性与计算效率折衷，因为前者渐近地实现了最佳动态遗憾$ tilde {o}（d ^ {2/3} b_t ^ {1/3} t ^ {2/3 }）$，但后者是Oracle高效，与武器数量的额外对数因子相比，与Minimax-Optimal动态遗憾相比。在模拟研究中，这两种算法都表现出折扣Linuxb斗争的保守主义问题中的出色表现。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共10页
作者
Baekjin Kim; Ambuj Tewari;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. The non-stationary stochastic multi-armed bandit problem [J] . Robin Allesiardo, Rapha?l Féraud, Odalric-Ambrym Maillard Current Forestry Reports . 2017,第4期

机译：非静止随机多武装强盗问题
2. Stochastic Adaptive Dynamics of a Simple Market as a Non-Stationary Multi-Armed Bandit Problem [J] . Yann Braouezec European journal of economic and social systems . 2009,第1期

机译：简单市场作为非平稳多臂强盗问题的随机自适应动力学
3. Estimation of stochastic nonlinear dynamic response of rock-fill dams with uncertain material parameters for non-stationary random seismic excitation [J] . Kemal Hacıefendioğlu, Alemdar Bayraktar, Hasan Basri Başağa Nonlinear Dynamics . 2010,第1a2期

机译：非平稳随机地震激励下具有不确定材料参数的堆石坝随机非线性动力响应估计
4. Perturbed-History Exploration in Stochastic Linear Bandits [C] . Branislav Kveton, Csaba Szepesvari, Mohammad Ghavamzadeh, Conference on Uncertainty in Artificial Intelligence . 2019

机译：随机线性匪徒的扰动历史探索
5. RESPONSE AND RELIABILITY ANALYSIS OF LINEAR AND NONLINEAR STOCHASTIC SYSTEMS SUBJECTED TO RANDOM EXCITATIONS [D] . KHATER, MAHMOUD M. EL-SALEH. 1988

机译：随机激励作用下的线性和非线性随机系统的响应和可靠性分析
6. Event-Triggered Fault Estimation for Stochastic Systems over Multi-Hop Relay Networks with Randomly Occurring Sensor Nonlinearities and Packet Dropouts [O] . Yunji Li, Li Peng 2018

机译：随机发生的传感器非线性和丢包情况的多跳中继网络上随机系统的事件触发故障估计
7. Optimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards [O] . Besbes, Omar, Gur, Yonatan, Zeevi, Assaf 2014

机译：多武装强盗问题的最优探索 - 开发非固定奖励

Randomized Exploration for Non-Stationary Stochastic Linear Bandits

摘要

著录项

相似文献

相关主题

期刊订阅