Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

机译：有限随机局部监测中的后悔下界和最优算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Partial monitoring is a general model for sequential learning with limited feedback formalized as a game between two players. In this game, the learner chooses an action and at the same time the opponent chooses an outcome, then the learner suffers a loss and receives a feedback signal. The goal of the learner is to minimize the total loss. In this paper, we study partial monitoring with finite actions and stochastic outcomes. We derive a logarithmic distribution-dependent regret lower bound that defines the hardness of the problem. Inspired by the DMED algorithm (Honda and Takemura, 2010) for the multi-armed bandit problem, we propose PM-DMED, an algorithm that minimizes the distribution-dependent regret. PM-DMED significantly outperforms state-of-the-art algorithms in numerical experiments. To show the optimality of PM-DMED with respect to the regret bound, we slightly modify the algorithm by introducing a hinge function (PM-DMED-Hinge). Then, we derive an asymptotically optimal regret upper bound of PM-DMED-Hinge that matches the lower bound.

机译：部分监控是用于顺序学习的通用模型，其有限的反馈形式化为两个玩家之间的博弈。在该游戏中，学习者选择一个动作，而对手选择一个结果，那么学习者遭受损失并收到反馈信号。学习者的目标是最大程度地减少总损失。在本文中，我们研究具有有限动作和随机结果的部分监视。我们得出依赖于对数分布的后悔下界，该下界定义了问题的难度。受DMED算法（本田和Takemura，2010）启发，针对多臂匪问题，我们提出了PM-DMED，该算法可最大程度地减少与分布有关的后悔。在数值实验中，PM-DMED明显优于最新算法。为了显示关于后悔界限的PM-DMED的最优性，我们通过引入铰链函数（PM-DMED-Hinge）稍微修改了该算法。然后，我们推导与下限匹配的PM-DMED-Hinge的渐近最优后悔上限。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2015年|1792-1800|共9页
会议地点
作者
Junpei Komiyama; Junya Honda; Hiroshi Nakagawa;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Partial Monitoring-Classification, Regret Bounds, and Algorithms [J] . Bartok Gabor, Foster Dean P., Pal David, Mathematics of operations research . 2014,第4期

机译：部分监视分类，后悔范围和算法
2. Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms [J] . Richard Combes, Chong Jiang, Rayadurgam Srikant Performance evaluation review . 2015,第1期

机译：有预算的土匪：遗憾的下界和最佳算法
3. Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms [J] . Perchet Vianney Journal of machine learning research . 2011,第Jun期

机译：内部监视与部分监视：基于校准的最佳算法
4. Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring [C] . Junpei Komiyama, Junya Honda, Hiroshi Nakagawa Annual conference on Neural Information Processing Systems . 2015

机译：有限随机部分监测中的遗憾和最佳算法
5. Analysis and finite element approximations of stochastic optimal control problems constrained by stochastic elliptic partial differential equations [D] . Lee, Jangwoon 2008

机译：随机椭圆偏微分方程约束的随机最优控制问题的分析和有限元逼近
6. Multiscale Stochastic Reaction–Diffusion Algorithms Combining Markov Chain Models with Stochastic Partial Differential Equations [O] . Hye-Won Kang, Radek Erban -1

机译：马尔可夫链模型与随机偏微分方程相结合的多尺度随机反应扩散算法
7. Partial monitoring – classification, regret bounds, and algorithms ∗ [O] . Gábor Bartók, Dean Foster, Dávid Pál Alex, 2014

机译：部分监控 - 分类，后悔限制和算法*

Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring

摘要

著录项

相似文献

相关主题

期刊订阅