首页> 外文会议>Conference on uncertainty in artificial intelligence >Finite-Time Analysis of Kernelised Contextual Bandits
【24h】

Finite-Time Analysis of Kernelised Contextual Bandits

机译:内核上下文强盗的有限时间分析

获取原文

摘要

We tackle the problem of online reward maximisation over a large finite set of actions described by their contexts. We focus on the case when the number of actions is too big to sample all of them even once. However we assume that we have access to the similarities between actions' contexts and that the expected reward is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS). We propose KernelUCB, a kernelised UCB algorithm, and give a cumulative regret bound through a frequentist analysis. For contextual bandits, the related algorithm GP-UCB turns out to be a special case of our algorithm, and our finite-time analysis improves the regret bound of GP-UCB for the agnostic case, both in the terms of the kernel-dependent quantity and the RKHS norm of the reward function. Moreover, for the linear kernel, our regret bound matches the lower bound for contextual linear bandits.
机译:我们针对由其上下文描述的大量有限动作解决在线奖励最大化的问题。我们关注的情况是动作数量太大而无法对所有动作进行一次采样。但是,我们假设我们可以访问动作上下文之间的相似性,并且预期奖励是相关的再现内核希尔伯特空间(RKHS)中上下文图像的任意线性函数。我们提出了一种内核化的UCB算法KernelUCB,并通过频度分析来给出累积的遗憾。对于上下文强盗,相关算法GP-UCB证明是我们算法的特例,并且我们的有限时间分析从与内核相关的数量上改善了GP-UCB对于不可知论案例的遗憾界限以及奖励功能的RKHS规范。此外,对于线性核,我们的后悔边界与上下文线性土匪的下边界匹配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号