Linear Bayes policy for learning in contextual-bandits

机译：线性贝叶斯策略在强盗中学习

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine and Statistical Learning techniques are used in almost all online advertisement systems. The problem of discovering which content is more demanded (e.g. receive more clicks) can be modeled as a multi-armed bandit problem. Contextual bandits (i.e., bandits with covariates, side information or associative reinforcement learning) associate, to each specific content, several features that define the “context” in which it appears (e.g. user, web page, time, region). This problem can be studied in the stochastic/statistical setting by means of the conditional probability paradigm using the Bayes’ theorem. However, for very large contextual information and/or real-time constraints, the exact calculation of the Bayes’ rule is computationally infeasible. In this article, we present a method that is able to handle large contextual information for learning in contextual-bandits problems. This method was tested in the Challenge on Yahoo! dataset at ICML2012’s Workshop “new Challenges for Exploration & Exploitation 3”, obtaining the second place. Its basic exploration policy is deterministic in the sense that for the same input data (as a time-series) the same results are obtained. We address the deterministic exploration vs. exploitation issue, explaining the way in which the proposed method deterministically finds an effective dynamic trade-off based solely in the input-data, in contrast to other methods that use a random number generator.

机译：机器和统计学习技术几乎用于所有在线广告系统。发现更多内容（例如，获得更多点击）的问题可以被建模为多武装匪徒问题。上下文强盗（即具有协变量，辅助信息或联想强化学习的强盗）将每种特征与定义其出现的“上下文”的几个功能（例如，用户，网页，时间，区域）相关联。可以使用贝叶斯定理，通过条件概率范式在随机/统计条件下研究此问题。但是，对于非常大的上下文信息和/或实时约束，贝叶斯规则的精确计算在计算上是不可行的。在本文中，我们提出了一种方法，该方法能够处理大量上下文信息，以便在上下文强盗问题中进行学习。此方法已在Yahoo!上的Challenge中进行了测试。 ICML2012研讨会“勘探与开发的新挑战3”数据集获得第二名。它的基本探索策略是确定性的，因为对于相同的输入数据（作为时间序列），可以获得相同的结果。我们解决了确定性探索与开发问题，解释了与使用随机数生成器的其他方法相比，所提出的方法仅基于输入数据确定性地找到有效动态权衡的方法。

著录项

作者
Vargas Perez Ana Maria; Martín H. José Antonio;
展开▼
作者单位

展开▼
年度 2013
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Linear Bayes policy for learning in contextual-bandits [J] . Jose Antonio Martin H., Ana M. Vargas Expert Systems with Application . 2013,第18期

机译：线性贝叶斯策略在强盗中学习
2. Bayes linear kinematics and Bayes linear Bayes graphical models [J] . Goldstein M, Shaw SC Biometrika . 2004,第2期

机译：贝叶斯线性运动学和贝叶斯线性贝叶斯图形模型
3. PAC-Bayes control: learning policies that provably generalize to novel environments [J] . Anirudha Majumdar, Alec Farid, Anoopkumar Sonar The International journal of robotics research . 2021,第2a3期

机译：Pac-Bayes Control：学习可证明的政策，可概括为新颖的环境
4. Contextual-Bandit based MIMO Relay Selection Policy with Channel Uncertainty [C] . Ankit Gupt, Naveen Mysore Balasubramanya, Mathini Sellathura IEEE International Conference on Communications . 2020

机译：具有上下文不确定性且基于信道的不确定性的MIMO中继选择策略
5. Empirical Bayes estimation in linear regression models under heteroscedasticity [D] . Qu, Pingping 2004

机译：异方差下线性回归模型的经验贝叶斯估计
6. Reinforcement Learning with Limited Reinforcement: Using Bayes Risk for Active Learning in POMDPs [O] . Finale Doshi, Joelle Pineau, Nicholas Roy -1

机译：通过有限的强化进行强化学习：使用Bayes风险在POMDP中进行主动学习
7. Scene recognition with naive bayes non-linear learning [O] . Fornoni, Marco, Caputo, Barbara 2014

机译：朴素贝叶斯非线性学习的场景识别
8. Bayes Least Squares Linear Regression is Asympotically Full Bayes: Estimation of Spectral Densities [R] . Brunk, H. D., Mohler, R. R. 1984

机译：贝叶斯最小二乘线性回归是非全局贝叶斯：光谱密度的估计

Linear Bayes policy for learning in contextual-bandits

摘要

著录项

相似文献

相关主题

期刊订阅