首页> 外文会议>European conference on machine learning and knowledge discovery in databases >A Time and Space Efficient Algorithm for Contextual Linear Bandits
【24h】

A Time and Space Efficient Algorithm for Contextual Linear Bandits

机译:上下文线性强盗的时空高效算法

获取原文

摘要

We consider a multi-armed bandit problem where payoffs are a linear function of an observed stochastic contextual variable. In the scenario where there exists a gap between optimal and suboptimal rewards, several algorithms have been proposed that achieve O(log T) regret after T time steps. However, proposed methods either have a computation complexity per iteration that scales linearly with T or achieve regrets that grow linearly with the number of contexts |X|. We propose an e-greedy type of algorithm that solves both limitations. In particular, when contexts are variables in R~d, we prove that our algorithm has a constant computation complexity per iteration of O(poly(d)) and can achieve a regret of O(poly(d) log T) even when |X| = Ω(2~d). In addition, unlike previous algorithms, its space complexity scales like O(Kd~2) and does not grow with T.
机译:我们考虑一个多武装的土匪问题,其中收益是观察到的随机上下文变量的线性函数。在最佳奖励与次优奖励之间存在差距的情况下,提出了几种算法,这些算法可在T个时间步长后实现O(log T)后悔。然而,所提出的方法要么具有随着T线性缩放的每次迭代的计算复杂度,要么获得随着上下文数量| X |线性增长的遗憾。我们提出了一种解决这两个局限性的电子贪婪算法。特别是,当上下文是R〜d中的变量时,我们证明了我们的算法在O(poly(d))的每次迭代中具有恒定的计算复杂度,即使在|的情况下也可以实现O(poly(d)log T)的遗憾。 X | =Ω(2〜d)。另外,与以前的算法不同,它的空间复杂度按O(Kd〜2)缩放,并且不随T增长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号