Partial Monitoring-Classification, Regret Bounds, and Algorithms

Bartok Gabor; Foster Dean P.; Pal David; Rakhlin Alexander; Szepesvari Csaba

首页> 外文期刊>Mathematics of operations research >Partial Monitoring-Classification, Regret Bounds, and Algorithms

【24h】

Partial Monitoring-Classification, Regret Bounds, and Algorithms

机译：部分监视分类，后悔范围和算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero or scales as T-1/2, T-2/3, or T up to constants and logarithmic factors. We provide computationally efficient learning algorithms that achieve the minimax regret within a logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.

机译：在部分监视游戏中，学习者反复选择一个动作，环境以结果做出响应，然后学习者遭受损失并接收到反馈信号，这两者都是该动作和结果的固定功能。学习者的目标是最大程度地减少后悔，这是他的总累积损失与事后最佳最佳动作的总损失之间的差。在本文中，我们用有限的许多动作和结果来描述任何部分监控游戏的最小极大遗憾。事实证明，任何此类游戏的极大极小遗憾要么为零，要么变为T-1 / 2，T-2 / 3或T，直至常数和对数因子。我们提供计算效率高的学习算法，可在任何游戏的对数因子内实现最小极大遗憾。如果我们假设结果是在i.i.d中生成的，那么除了极大极小值后悔的界限外。时尚方面，我们证明了预期遗憾的个人上限。

著录项

来源
《Mathematics of operations research》 |2014年第4期|共31页
作者
Bartok Gabor; Foster Dean P.; Pal David; Rakhlin Alexander; Szepesvari Csaba;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类运筹学;
关键词
repeated games; partial monitoring; imperfect information; regret analysis;

机译：重复游戏;部分监控;不完善的信息;遗憾的分析;

相似文献

外文文献
中文文献
专利

1. Partial Monitoring-Classification, Regret Bounds, and Algorithms [J] . Bartok Gabor, Foster Dean P., Pal David, Mathematics of operations research . 2014,第4期

机译：部分监视分类，后悔范围和算法
2. Learning to Rank: Regret Lower Bounds and Efficient Algorithms [J] . Richard Combes, Stefan Magureanu, Alexandre Proutiere, Performance evaluation review . 2015,第1期

机译：学习排名：遗憾的下界和高效算法
3. Bandits with Budgets: Regret Lower Bounds and Optimal Algorithms [J] . Richard Combes, Chong Jiang, Rayadurgam Srikant Performance evaluation review . 2015,第1期

机译：有预算的土匪：遗憾的下界和最佳算法
4. Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring [C] . Junpei Komiyama, Junya Honda, Hiroshi Nakagawa Annual conference on Neural Information Processing Systems . 2015

机译：有限随机局部监测中的后悔下界和最优算法
5. From Stability to Low-Regret Algorithms in Stochastic Multi-Armed Bandits [D] . Huang, Kuan-Sung. 2021

机译：从随机多武装匪中的低遗憾算法到低遗憾算法
6. Heuristic algorithms for the minmax regret flow-shop problem with interval processing times [O] . Michał Ćwik, Jerzy Józefczyk -1

机译：minmax的启发式算法使处理时间间隔为间隔的流水车间问题后悔
7. Partial monitoring – classification, regret bounds, and algorithms ∗ [O] . Gábor Bartók, Dean Foster, Dávid Pál Alex, 2014

机译：部分监控 - 分类，后悔限制和算法*

Partial Monitoring-Classification, Regret Bounds, and Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅