首页> 外文OA文献 >Linear Bayes policy for learning in contextual-bandits
【2h】

Linear Bayes policy for learning in contextual-bandits

机译:线性贝叶斯策略在强盗中学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e., bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that define the “context” in which it appears (e.g. user, web page, time, region). This problem can be studied in the stochastic/statistical setting by means of the conditional probability paradigm using the Bayes’ theorem. However, for very large contextual information and/or real-time constraints, the exact calculation of the Bayes’ rule is computationally infeasible. In this article, we present a method that is able to handle large contextual information for learning in contextual-bandits problems. This method was tested in the Challenge on Yahoo! dataset at ICML2012’s Workshop “new Challenges for Exploration & Exploitation 3”, obtaining the second place. Its basic exploration policy is deterministic in the sense that for the same input data (as a time-series) the same results are obtained. We address the deterministic exploration vs. exploitation issue, explaining the way in which the proposed method deterministically finds an effective dynamic trade-off based solely in the input-data, in contrast to other methods that use a random number generator.
机译:机器和统计学习技术几乎用于所有在线广告系统。发现更多内容(例如,获得更多点击)的问题可以被建模为多武装匪徒问题。上下文强盗(即具有协变量,辅助信息或联想强化学习的强盗)将每种特征与定义其出现的“上下文”的几个功能(例如,用户,网页,时间,区域)相关联。可以使用贝叶斯定理,通过条件概率范式在随机/统计条件下研究此问题。但是,对于非常大的上下文信息和/或实时约束,贝叶斯规则的精确计算在计算上是不可行的。在本文中,我们提出了一种方法,该方法能够处理大量上下文信息,以便在上下文强盗问题中进行学习。此方法已在Yahoo!上的Challenge中进行了测试。 ICML2012研讨会“勘探与开发的新挑战3”数据集获得第二名。它的基本探索策略是确定性的,因为对于相同的输入数据(作为时间序列),可以获得相同的结果。我们解决了确定性探索与开发问题,解释了与使用随机数生成器的其他方法相比,所提出的方法仅基于输入数据确定性地找到有效动态权衡的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号