...
首页> 外文期刊>Mathematics of operations research >Partial Monitoring-Classification, Regret Bounds, and Algorithms
【24h】

Partial Monitoring-Classification, Regret Bounds, and Algorithms

机译:部分监视分类,后悔范围和算法

获取原文
获取原文并翻译 | 示例
           

摘要

In a partial monitoring game, the learner repeatedly chooses an action, the environment responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his regret, which is the difference between his total cumulative loss and the total loss of the best fixed action in hindsight. In this paper we characterize the minimax regret of any partial monitoring game with finitely many actions and outcomes. It turns out that the minimax regret of any such game is either zero or scales as T-1/2, T-2/3, or T up to constants and logarithmic factors. We provide computationally efficient learning algorithms that achieve the minimax regret within a logarithmic factor for any game. In addition to the bounds on the minimax regret, if we assume that the outcomes are generated in an i.i.d. fashion, we prove individual upper bounds on the expected regret.
机译:在部分监视游戏中,学习者反复选择一个动作,环境以结果做出响应,然后学习者遭受损失并接收到反馈信号,这两者都是该动作和结果的固定功能。学习者的目标是最大程度地减少后悔,这是他的总累积损失与事后最佳最佳动作的总损失之间的差。在本文中,我们用有限的许多动作和结果来描述任何部分监控游戏的最小极大遗憾。事实证明,任何此类游戏的极大极小遗憾要么为零,要么变为T-1 / 2,T-2 / 3或T,直至常数和对数因子。我们提供计算效率高的学习算法,可在任何游戏的对数因子内实现最小极大遗憾。如果我们假设结果是在i.i.d中生成的,那么除了极大极小值后悔的界限外。时尚方面,我们证明了预期遗憾的个人上限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号