首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Contextual Bandits with Stochastic Experts
【24h】

Contextual Bandits with Stochastic Experts

机译:随机专家的背景强盗

获取原文
           

摘要

We consider the problem of contextual bandits with stochastic experts, which is a variation of the traditional stochastic contextual bandit with experts problem. In our problem setting, we assume access to a class of stochastic experts, where each expert is a conditional distribution over the arms given a context. We propose upper-confidence bound (UCB) algorithms for this problem, which employ two different importance sampling based estimators for the mean reward for each expert. Both these estimators leverage information leakage among the experts, thus using samples collected under all the experts to estimate the mean reward of any given expert. This leads to instance dependent regret bounds of $mathcal{O}left(λ(pmb{μ})mathcal{M}log T/?ight)$, where $λ(pmb{μ})$ is a term that depends on the mean rewards of the experts, $?$ is the smallest gap between the mean reward of the optimal expert and the rest, and $mathcal{M}$ quantifies the information leakage among the experts. We show that under some assumptions $λ(pmb{μ})$ is typically $mathcal{O}(log N)$. We implement our algorithm with stochastic experts generated from cost-sensitive classification oracles and show superior empirical performance on real-world datasets, when compared to other state of the art contextual bandit algorithms.
机译:我们考虑随机专家的情境强盗问题,这是传统的随机专家情境强盗问题的变体。在我们的问题设置中,我们假设会访问一类随机专家,其中,每个专家都是在给定上下文的条件下有条件地分布。我们针对此问题提出了高置信界(UCB)算法,该算法采用了两种基于重要性采样的估计器来估计每位专家的平均收益。这两个估算器都利用专家之间的信息泄漏,从而使用在所有专家下收集的样本来估算任何给定专家的平均回报。这导致实例依赖后悔界限为$ mathcal {O} left(λ( pmb {μ}) mathcal {M} log T /? right)$,其中$λ( pmb {μ}) $是一个取决于专家的平均奖励的术语,$?$是最佳专家与其他专家的平均奖励之间的最小差距,$ mathcal {M} $量化了专家之间的信息泄漏。我们表明,在某些假设下,$λ( pmb {μ})$通常为$ mathcal {O}( log N)$。与其他最先进的上下文强盗算法相比,我们使用从成本敏感的分类算法生成的随机专家来实施我们的算法,并在现实数据集上显示出卓越的经验性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号