The Impatient May Use Limited Optimism to Minimize Regret

机译：不耐烦的人可能会使用有限的乐观情绪来最大程度地减少后悔

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may realize that, with hindsight, she could have increased her reward by playing differently: this difference in outcomes constitutes her regret value. The agent may thus elect to follow a regret-minimal strategy. In this paper, it is shown that (1) there always exist regret-minimal strategies that are admissible -a strategy being inadmissible if there is another strategy that always performs better; (2) computing the minimum possible regret or checking that a strategy is regret-minimal can be done in coNP~(NP), disregarding the computational cost of numerical analysis (otherwise, this bound becomes PSpace).

机译：折扣和博弈为强化学习的研究提供了一个正式的模型，在该模型中，代理商被诱使尽早获得奖励，因为后来的奖励被折扣了。当特工与环境互动时，她可能会意识到，事后看来，她可以通过发挥不同的方式来增加自己的报酬：这种结果差异构成了她的遗憾价值。代理因此可以选择遵循后悔最小策略。本文表明，（1）总是存在后悔的最小策略，这些策略是可以接受的；如果存在另一种总是表现更好的策略，则该策略是不可接受的；（2）可以在coNP〜（NP）中完成计算最小可能后悔或检查策略是否后悔最小的方法，而无需考虑数值分析的计算成本（否则，该界限变为PSpace）。

著录项

来源
《International Conference on Foundations of Software Science and Computation Structures;European Conferences on Theory and Practice of Software》|2019年|133-149|共17页
会议地点 Prague(CZ)
作者
Michaeel Cadilhac; Guillermo A. Perez; Marie van den Bogaard;
展开▼
作者单位

University of Oxford Oxford UK;

University of Antwerp Antwerp Belgium;

Universite libre de Bruxelles Brussels Belgium;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Admissibility; Discounted-sum games; Regret minimization;

机译：可接纳性；折扣和游戏；遗憾最小化;

相似文献

外文文献
中文文献
专利

1. Optimal pricing to minimize maximum regret with limited demand information [J] . Chen Ming, Chen Zhi-Long Computers & operations research . 2020,第Deca期

机译：最佳定价，以最大限度地减少有限需求信息的最大遗憾
2. Minimizing Dynamic Regret and Adaptive Regret Simultaneously [J] . Lijun Zhang, Shiyin Lu, Tianbao Yang JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：尽量减少动态遗憾和适应性遗憾
3. Routing Without Regret: On Convergence to Nash Equilibria of Regret-Minimizing Algorithms in Routing Games [J] . Avrim Blum, Eyal Even-Dar, Katrina Ligett Theory of Computing . 2010,第2期

机译：没有遗憾的情况下进行选路：选路游戏中使后悔最小化算法收敛到Nash均衡
4. The Impatient May Use Limited Optimism to Minimize Regret [C] . Michaeel Cadilhac, Guillermo A. Perez, Marie van den Bogaard International Conference on Foundations of Software Science and Computation Structures . 2019

机译：不耐烦可能使用有限的乐观，以尽量减少遗憾
5. Regret-Minimizing Algorithms Beyond Classical Optimization and Control [D] . ?Zhang, Cyril 2020

机译：遗憾最小化算法超越经典优化和控制
6. Scaling up psychology via Scientific Regret Minimization [O] . Mayank Agrawal, Joshua C. Peterson, Thomas L. Griffiths 2020

机译：通过科学遗为最小化扩大心理学
7. Risk minimization, regret minimization and progressive hedging algorithms [O] . Jie Sun, Xinmin Yang, Qiang Yao, 2020

机译：危险最小化，后悔最小化和进步性对冲算法

The Impatient May Use Limited Optimism to Minimize Regret

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅