首页> 外文会议>Conference on uncertainty in artificial intelligence >Finite-Time Analysis of Kernelised Contextual Bandits

【24h】

Finite-Time Analysis of Kernelised Contextual Bandits

机译：内核上下文强盗的有限时间分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions' contexts and that the expected reward is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS). We propose KernelUCB, a kernelised UCB algorithm, and give a cumulative regret bound through a frequentist analysis. For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function. Moreover, for the linear kernel, our regret bound matches the lower bound for contextual linear bandits.

机译：我们针对由其上下文描述的大量有限动作解决在线奖励最大化的问题。我们关注的情况是动作数量太大而无法对所有动作进行一次采样。但是，我们假设我们可以访问动作上下文之间的相似性，并且预期奖励是相关的再现内核希尔伯特空间（RKHS）中上下文图像的任意线性函数。我们提出了一种内核化的UCB算法KernelUCB，并通过频度分析来给出累积的遗憾。对于上下文强盗，相关算法GP-UCB证明是我们算法的特例，并且我们的有限时间分析从与内核相关的数量上改善了GP-UCB对于不可知论案例的遗憾界限以及奖励功能的RKHS规范。此外，对于线性核，我们的后悔边界与上下文线性土匪的下边界匹配。

著录项

来源
《Conference on uncertainty in artificial intelligence 》|2013年|654-663|共10页
会议地点
作者
Michal Valko; Nathan Korda; Remi Munos; Ilias Flaounas; Nello Cristianini;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A cost-based analysis for risk-averse explore-then-commit finite-time bandits [J] . Ali Yekkehkhany, Ebrahim Arian, Rakesh Nagi, AIIE Transactions . 2021 ,第10期

机译：风险厌恶探索的成本分析 - 提交有限时间匪徒
2. Finite-time Analysis of the Multiarmed Bandit Problem [J] . Peter Auer, Nicolo Cesa-Bianchi, Paul Fischer Machine Learning . 2002 ,第2a3期

机译：多臂强盗问题的有限时间分析
3. Finite-time lower bounds for the two-armed bandit problem [J] . Kulkarni S.R., Lugosi G. IEEE Transactions on Automatic Control . 2000 ,第4期

机译：双臂匪徒问题的有限时间下界
4. Finite-Time Analysis of Kernelised Contextual Bandits [C] . Michal Valko, Nathan Korda, Remi Munos, Conference on Uncertainty in Artificial Intelligence . 2013

机译：内孔化背景匪徒的有限时间分析
5. Using Contextual Bandits to Improve Traffic Performance in Edge Network [D] . Al Zadjali, Aziza Najeeb. 2021

机译：使用上下文匪徒改进边缘网络中的流量性能
6. Action Centered Contextual Bandits [O] . Kristjan Greenewald, Ambuj Tewari, Predrag Klasnja, -1

机译：行动为中心的情境强盗
7. Finite-Time Analysis of Kernelised Contextual Bandits [O] . Valko Michal, Korda Nathan, Munos Rémi, 2013

机译：核心语境匪徒的有限时间分析

Finite-Time Analysis of Kernelised Contextual Bandits

摘要

著录项

相似文献

相关主题

期刊订阅