...
首页> 外文期刊>Journal of machine learning research >Regret Bounds and Minimax Policies under Partial Monitoring
【24h】

Regret Bounds and Minimax Policies under Partial Monitoring

机译:部分监视下的后悔界限和极小极大策略

获取原文
           

摘要

This work deals with four classical prediction settings, namely full information, bandit, label efficient and bandit label efficient as well as four different notions of regret: pseudo-regret, expected regret, high probability regret and tracking the best expert regret. We introduce a new forecaster, INF (Implicitly Normalized Forecaster) based on an arbitrary function ψ for which we propose a unified analysis of its pseudo-regret in the four games we consider. In particular, for ψ(x)=exp(η x) + γ/K, INF reduces to the classical exponentially weighted average forecaster and our analysis of the pseudo-regret recovers known results while for the expected regret we slightly tighten the bounds. On the other hand with ψ(x)=(η/-x)q + γ/K, which defines a new forecaster, we are able to remove the extraneous logarithmic factor in the pseudo-regret bounds for bandits games, and thus fill in a long open gap in the characterization of the minimax rate for the pseudo-regret in the bandit game. We also provide high probability bounds depending on the cumulative reward of the optimal action. Finally, we consider the stochastic bandit game, and prove that an appropriate modification of the upper confidence bound policy UCB1 (Auer et al., 2002a) achieves the distribution-free optimal rate while still having a distribution-dependent rate logarithmic in the number of plays. color="gray">
机译:这项工作涉及四个经典预测设置,即完整信息,强盗,有效标签和有效强盗标签,以及后悔的四个不同概念:伪后悔,期望后悔,高概率后悔和跟踪最佳专家后悔。我们基于任意函数ψ引入一个新的预测器INF(隐式归一化预测器),为此,我们考虑了四个游戏,对其伪后悔提出了统一分析。特别是对于ψ(x) = exp (ηx)+γ/ K ,INF简化为经典的指数加权平均预报器,我们对伪后悔的分析恢复了已知的结果,而对于预期的遗憾,我们略微缩小了界限。另一方面,使用ψ(x)=(η/ -x) q +γ/ K 定义了新的预测器,我们可以删除无关的对数影响土匪游戏伪后悔界限的因素,从而填补了在土匪游戏中伪后悔的最小最大速率的表征中一个长期开放的空白。我们还根据最佳操作的累积奖励提供高概率范围。 最后,我们考虑了随机土匪博弈,并证明了对置信区间上限策略UCB1的适当修改(Auer等人,2002a)可以实现无分布的最优利率,同时仍然具有依赖于分布的利率对数。播放次数。 color =“ gray”>

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号