Regret Bounds and Minimax Policies under Partial Monitoring

Audibert Jean-Yves; Bubeck S??bastien

首页> 外文期刊>Journal of machine learning research >Regret Bounds and Minimax Policies under Partial Monitoring

【24h】

Regret Bounds and Minimax Policies under Partial Monitoring

机译：部分监视下的后悔界限和极小极大策略

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for ψ(x)=exp(η x) + γ/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with ψ(x)=(η/-x)q + γ/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays. color="gray">

机译：这项工作涉及四个经典预测设置，即完整信息，强盗，有效标签和有效强盗标签，以及后悔的四个不同概念：伪后悔，期望后悔，高概率后悔和跟踪最佳专家后悔。我们基于任意函数ψ引入一个新的预测器INF（隐式归一化预测器），为此，我们考虑了四个游戏，对其伪后悔提出了统一分析。特别是对于ψ（x） = exp （ηx）+γ/ K ，INF简化为经典的指数加权平均预报器，我们对伪后悔的分析恢复了已知的结果，而对于预期的遗憾，我们略微缩小了界限。另一方面，使用ψ（x）=（η/ -x） q +γ/ K 定义了新的预测器，我们可以删除无关的对数影响土匪游戏伪后悔界限的因素，从而填补了在土匪游戏中伪后悔的最小最大速率的表征中一个长期开放的空白。我们还根据最佳操作的累积奖励提供高概率范围。最后，我们考虑了随机土匪博弈，并证明了对置信区间上限策略UCB1的适当修改（Auer等人，2002a）可以实现无分布的最优利率，同时仍然具有依赖于分布的利率对数。播放次数。 color =“ gray”>

著录项

来源
《Journal of machine learning research》 |2010年第10期|共52页
作者
Audibert Jean-Yves; Bubeck S??bastien;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Partial Monitoring-Classification, Regret Bounds, and Algorithms [J] . Bartok Gabor, Foster Dean P., Pal David, Mathematics of operations research . 2014,第4期

机译：部分监视分类，后悔范围和算法
2. Minimax Regret Bounds for Reinforcement Learning [J] . Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos JMLR: Workshop and Conference Proceedings . 2017,第3期

机译：增强学习的Minimax后悔界限
3. Minimax Regret Bounds for Reinforcement Learning [J] . Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos JMLR: Workshop and Conference Proceedings . 2017,第2009期

机译：增强学习的Minimax后悔界限
4. Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring [C] . Junpei Komiyama, Junya Honda, Hiroshi Nakagawa Annual conference on Neural Information Processing Systems . 2015

机译：有限随机局部监测中的后悔下界和最优算法
5. A minimax regret approach to robust beamforming. [D] . Byun, Jungsub. 2009

机译：最小最大后悔方法用于强大的波束成形。
6. MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA [O] . Aharon Birnbaum, Iain M. Johnstone, Boaz Nadler, -1

机译：具有嘈杂高维数据的稀疏PCA的MINIMAX界线
7. Regret Bounds and Minimax Policies under Partial Monitoring [O] . Audibert Jean-Yves, Bubeck Sébastien 2010

机译：部分监视下的后悔界限和极小极大策略

Regret Bounds and Minimax Policies under Partial Monitoring

摘要

著录项

相似文献

相关主题

期刊订阅