首页> 外文会议>Annual conference on Neural Information Processing Systems >Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring
【24h】

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

机译:有限随机局部监测中的后悔下界和最优算法

获取原文

摘要

Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.
机译:部分监控是用于顺序学习的通用模型,其有限的反馈形式化为两个玩家之间的博弈。在该游戏中,学习者选择一个动作,而对手选择一个结果,那么学习者遭受损失并收到反馈信号。学习者的目标是最大程度地减少总损失。在本文中,我们研究具有有限动作和随机结果的部分监视。我们得出依赖于对数分布的后悔下界,该下界定义了问题的难度。受DMED算法(本田和Takemura,2010)启发,针对多臂匪问题,我们提出了PM-DMED,该算法可最大程度地减少与分布有关的后悔。在数值实验中,PM-DMED明显优于最新算法。为了显示关于后悔界限的PM-DMED的最优性,我们通过引入铰链函数(PM-DMED-Hinge)稍微修改了该算法。然后,我们推导与下限匹配的PM-DMED-Hinge的渐近最优后悔上限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号