Taking a hint: How to leverage loss predictors in contextual bandits?

Chen-Yu Wei; Haipeng Luo; Alekh Agarwal

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Taking a hint: How to leverage loss predictors in contextual bandits?

【24h】

Taking a hint: How to leverage loss predictors in contextual bandits?

机译：提示：如何在上下文匪徒中利用损失预测因子？

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We initiate the study of learning in contextual bandits with the help of loss predictors. The main question we address is whether one can improve over the minimax regret $mathcal{O}(sqrt{T})$ for learning over $T$ rounds, when the total error of the predicted losses relative to the realized losses, denoted as $mathcal{E} leq T$, is relatively small. We provide a complete answer to this question, with upper and lower bounds for various settings: adversarial and stochastic environments, known and unknown $mathcal{E}$, and single and multiple predictors. We show several surprising results, such as 1) the optimal regret is $mathcal{O}(min{sqrt{T}, sqrt{mathcal{E}}T^rac{1}{4}})$ when $mathcal{E}$ is known, in contrast to the standard and better bound $mathcal{O}(sqrt{mathcal{E}})$ for non-contextual problems (such as multi-armed bandits); 2) the same bound cannot be achieved if $mathcal{E}$ is unknown, but as a remedy, $mathcal{O}(sqrt{mathcal{E}}T^rac{1}{3})$ is achievable; 3) with $M$ predictors, a linear dependence on $M$ is necessary, even though logarithmic dependence is possible for non-contextual problems. We also develop several novel algorithmic techniques to achieve matching upper bounds, including 1) a key emph{action remapping} technique for optimal regret with known $mathcal{E}$, 2) computationally efficient implementation of Catoni’s robust mean estimator via an ERM oracle in the stochastic setting with optimal regret, 3) an underestimator for $mathcal{E}$ via estimating the histogram with bins of exponentially increasing size for the stochastic setting with unknown $mathcal{E}$, and 4) a self-referential scheme for learning with multiple predictors, all of which might be of independent interest.

机译：我们在损失预测因子的帮助下启动了在语境匪徒中学习的研究。我们地址的主要问题是，当预测损失与实现损失的总误差相对于实现损失的总误差时，我们是否可以改善最小令人遗憾的$ mathcal {}（ sqrt {t}）$。表示为$ mathcal {e} leq t $，相对较小。我们为此问题提供了完整的答案，具有各种设置的上下界限：对冲和随机环境，已知和未知$ Mathcal {e} $，单个和多个预测器。我们展示了几种令人惊讶的结果，例如1）最佳遗憾是$ Mathcal {o}（ min { sqrt {t}， sqrt { mathcal {e}} t ^ frac {1} {4} }）$当$ mathcal {e} $时，与标准和更好的绑定$ mathcal {o}相比（ sqrt { mathcal {e}}）相比（例如多个剪断的匪徒）; 2）如果$ mathcal {e} $未知，但作为补救措施，则无法实现相同的绑定，但作为补救措施，$ mathcal {o}（ sqrt { mathcal {e}} t ^ frac {1} {3} ）$可实现; 3）对于M $预测器，即使对数对数依赖性的非上下文问题也是可能的，也需要线性依赖。我们还开发了几种新颖的算法技术来实现匹配的上限，包括1）键 EMPH {动作重新绘制}技术，用于通过已知的$ mathcal {e} $，2）通过一个已知的$ mathcal {e} $，2）通过一个ERM Oracle在随机设置中，最佳遗憾，3）$ mathcal {e} $通过估计随机速度尺寸的频率，为随机设置的频率估算，具有未知$ mathcal {e} $和4）a与多个预测因子学习的自我引用方案，所有这些都可能是独立的兴趣。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共52页
作者
Chen-Yu Wei; Haipeng Luo; Alekh Agarwal;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Online Residential Demand Response via Contextual Multi-Armed Bandits [J] . Chen Xin, Nie Yutong, Li Na IEEE Control Systems Letters . 2021,第2期

机译：通过上下文多武装匪徒在线住宅需求响应
2. Contextual Bandit Approach-based Recommendation System for Personalized Web-based Services [J] . Pilani Akshay, Mathur Kritagya, Agrawal Himanshu, Applied Artificial Intelligence . 2021,第5a8期

机译：基于语调的基于Web的服务的方法 - 基于Birt方法的推荐系统
3. Statistical Inference for Online Decision Making: In a Contextual Bandit Setting [J] . Chen Haoyu, Lu Wenbin, Song Rui Journal of the American statistical association . 2021,第533期

机译：在线决策的统计推理：在一个上下文的强盗设置中
4. How Predictable is Your State? Leveraging Lexical and Contextual Information for Predicting Legislative Floor Action at the State Level [C] . Vlad Eidelman, Anastassia Kornilova, Daniel Argyle International conference on computational linguistics . 2018

机译：您的状态如何可预测？利用词汇和上下文信息来预测州一级的立法行动
5. Using Contextual Bandits to Improve Traffic Performance in Edge Network [D] . Al Zadjali, Aziza Najeeb. 2021

机译：使用上下文匪徒改进边缘网络中的流量性能
6. Action Centered Contextual Bandits [O] . Kristjan Greenewald, Ambuj Tewari, Predrag Klasnja, -1

机译：行动为中心的情境强盗
7. Context Attentive Bandits: Contextual Bandit with Restricted Context [O] . Bouneffouf, Djallel, Rish, Irina, Cecchi, Guillermo A., 2017

机译：语境殷勤强盗：具有受限上下文的语境强盗

Taking a hint: How to leverage loss predictors in contextual bandits?

摘要

著录项

相似文献

相关主题

期刊订阅