Semiparametric Contextual Bandits

Akshay Krishnamurthy; Zhiwei Steven Wu; Vasilis Syrgkanis

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Semiparametric Contextual Bandits

【24h】

Semiparametric Contextual Bandits

机译：半参数上下文强盗

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for a chosen action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve $ilde{O}(dsqrt{T})$ regret over $T$ rounds, when the linear function is $d$-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenwald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.

机译：本文研究了半参数语境匪徒，这是线性随机匪徒问题的一般化，其中对选定动作的奖励被建模为已知动作特征的线性函数，该函数与非线性动作无关项混淆。我们设计了新的算法，当线性函数为$ d $维时，它会在$ T $次回合中实现$ tilde {O}（d sqrt {T}）$后悔，这与最简单的无混淆情况匹配最著名的范围并改进了Greenwald等人的最新结果。（2017）。通过经验评估，我们发现当奖励存在非线性混杂影响时，我们的算法优于先前的方法。从技术上讲，我们的算法使用了一种受双重稳健方法启发的新的奖励估算器，而我们的证明要求自归一化mar的新浓度不等式。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第2009期|共10页
作者
Akshay Krishnamurthy; Zhiwei Steven Wu; Vasilis Syrgkanis;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Online Residential Demand Response via Contextual Multi-Armed Bandits [J] . Chen Xin, Nie Yutong, Li Na IEEE Control Systems Letters . 2021,第2期

机译：通过上下文多武装匪徒在线住宅需求响应
2. Contextual Bandit Approach-based Recommendation System for Personalized Web-based Services [J] . Pilani Akshay, Mathur Kritagya, Agrawal Himanshu, Applied Artificial Intelligence . 2021,第5a8期

机译：基于语调的基于Web的服务的方法 - 基于Birt方法的推荐系统
3. Statistical Inference for Online Decision Making: In a Contextual Bandit Setting [J] . Chen Haoyu, Lu Wenbin, Song Rui Journal of the American statistical association . 2021,第533期

机译：在线决策的统计推理：在一个上下文的强盗设置中
4. Contextual Multi-armed Bandit Algorithm for Semiparametric Reward Model [C] . Gi-Soo Kim, Myunghee Cho Paik International Conference on Machine Learning . 2019

机译：半射频奖励模型的上下文多武装强盗算法
5. Using Contextual Bandits to Improve Traffic Performance in Edge Network [D] . Al Zadjali, Aziza Najeeb. 2021

机译：使用上下文匪徒改进边缘网络中的流量性能
6. Action Centered Contextual Bandits [O] . Kristjan Greenewald, Ambuj Tewari, Predrag Klasnja, -1

机译：行动为中心的情境强盗
7. Context Attentive Bandits: Contextual Bandit with Restricted Context [O] . Bouneffouf, Djallel, Rish, Irina, Cecchi, Guillermo A., 2017

机译：语境殷勤强盗：具有受限上下文的语境强盗

Semiparametric Contextual Bandits

摘要

著录项

相似文献

相关主题

期刊订阅