Statistical Inference for Online Decision Making: In a Contextual Bandit Setting

Chen Haoyu; Lu Wenbin; Song Rui

首页> 外文期刊>Journal of the American statistical association >Statistical Inference for Online Decision Making: In a Contextual Bandit Setting

【24h】

Statistical Inference for Online Decision Making: In a Contextual Bandit Setting

机译：在线决策的统计推理：在一个上下文的强盗设置中

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Online decision making problem requires us to make a sequence of decisions based on incremental information. Common solutions often need to learn a reward model of different actions given the contextual information and then maximize the long-term reward. It is meaningful to know if the posited model is reasonable and how the model performs in the asymptotic sense. We study this problem under the setup of the contextual bandit framework with a linear reward model. The epsilon-greedy policy is adopted to address the classic exploration-and-exploitation dilemma. Using the martingale central limit theorem, we show that the online ordinary least squares estimator of model parameters is asymptotically normal. When the linear model is misspecified, we propose the online weighted least squares estimator using the inverse propensity score weighting and also establish its asymptotic normality. Based on the properties of the parameter estimators, we further show that the in-sample inverse propensity weighted value estimator is asymptotically normal. We illustrate our results using simulations and an application to a news article recommendation dataset from Yahoo!.for this article are available online.

机译：在线决策问题要求我们根据增量信息制定一系列决策。常见的解决方案通常需要学习给定上下文信息的不同动作的奖励模型，然后最大化长期奖励。知道假设模型是否合理，并且模型如何在渐近感知是有意义的。我们在具有线性奖励模型的上下文强盗框架的设置下研究了这个问题。采用epsilon-贪婪的政策来解决经典的探索和开发困境。使用Martingale中央极限定理，我们表明模型参数的在线普通最小二乘估计是渐近正常的。当线性模型被遗漏时，我们使用逆倾向得分加权提出在线加权最小二乘估计，并建立其渐近正常性。基于参数估计器的属性，我们进一步表明，样本逆倾向加权值估计器是渐近正常的。我们用仿真和向新闻文章推荐数据集的应用程序说明了我们的结果，来自雅虎的数据集！。对于本文在线提供。

著录项

来源
《Journal of the American statistical association》 |2021年第533期|240-255|共16页
作者
Chen Haoyu; Lu Wenbin; Song Rui;
展开▼
作者单位

North Carolina State Univ Dept Stat Raleigh NC 27695 USA;

North Carolina State Univ Dept Stat Raleigh NC 27695 USA;

North Carolina State Univ Dept Stat Raleigh NC 27695 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Epsilon-greedy; Inverse propensity weighted estimator; Model misspecification; Online decision making; Statistical inference;

机译：epsilon-贪婪;反向倾向加权估算器;模型拼写;在线决策;统计推理;

相似文献

外文文献
中文文献
专利

1. Variational inference for the multi-armed contextual bandit [J] . I?igo Urteaga, Chris Wiggins JMLR: Workshop and Conference Proceedings . 2018,第3期

机译：多臂上下文强盗的变分推理
2. Online Updating of Statistical Inference in the Big Data Setting [J] . Schifano Elizabeth D., Wu Jing, Wang Chun, Technometrics . 2016,第3期

机译：大数据环境中统计推断的在线更新
3. Statistical Inference for Online Decision Making via Stochastic Gradient Descent [J] . Chen Haoyu, Lu Wenbin, Song Rui Journal of the American statistical association . 2021,第534期

机译：通过随机梯度下降的在线决策统计推断
4. Corrupted Contextual Bandits: Online Learning with Corrupted Context [C] . Djallel Bouneffouf IEEE International Conference on Acoustics, Speech and Signal Processing . 2021

机译：损坏的上下文匪徒：与上下文损坏的在线学习
5. Sequential Decision Making for Active Learning and Inference in Online Settings [D] . Huang, Boshuang. 2020

机译：在线设置中的主动学习和推理的顺序决策
6. Online Updating of Statistical Inference in the Big Data Setting [O] . Elizabeth D. Schifano, Jing Wu, Chun Wang, -1

机译：大数据环境中统计推断的在线更新
7. Online Updating of Statistical Inference in the Big Data Setting [O] . Schifano, Elizabeth D., Wu, Jing, Wang, Chun, 2015

机译：大数据环境下统计推断的在线更新

Statistical Inference for Online Decision Making: In a Contextual Bandit Setting

摘要

著录项

相似文献

相关主题

期刊订阅