Discount and speed/execution tradeoffs in Markov Decision Process games

机译：马尔可夫决策过程游戏中的折扣和速度/执行权衡

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We study Markov Decision Process (MDP) games with the usual ±1 reinforcement signal. We consider the scenario in which the goal of the game, rather than just winning, is to maximize the number of wins in an allotted period of time (or maximize the expected reward in the same period). In the reinforcement learning literature, this type of tradeoff is often handled by tuning the discount parameter in order to encourage the learning algorithm to find policies that take fewer steps on average, at the cost of a lower probability of winning. We show that this approach is not guaranteed to solve the tradeoff problem optimally, and hence a different strategy is needed when tackling this type of problems.

机译：我们研究马尔可夫决策过程（MDP）游戏，具有通常的±1个加固信号。我们考虑游戏目标而不是赢得胜利的情景，是在分配的时间内最大化胜利数（或最大化同一时期的预期奖励）。在加强学习文献中，这种类型的权衡通常通过调整折扣参数来处理，以便鼓励学习算法找到平均较少的阶段的策略，以较低的获胜。我们表明这种方法不保证最佳地解决权衡问题，因此在解决这种问题时需要不同的策略。

著录项

来源
《IEEE Symposium on Computational Intelligence and Games》|2011年||共8页
会议地点
作者
Uribe Reinaldo; Lozano Fernando; Shibata Katsunari; Anderson Charles;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. On discounted approximations of undiscounted stochastic games and Markov decision processes with limited randomness [J] . Boros E., Elbassioni K., Gurvich V., Operations Research Letters: A Journal of the Operations Research Society of America . 2013,第4期

机译：有限随机性下无折扣随机博弈的折现近似和马尔可夫决策过程
2. Decision Roll and Horizon Roll Processes in Infinite Horizon Discounted Markov Decision Processes [J] . D. J. White Management science: Journal of the Institute of Management Sciences . 1996,第1期

机译：无限地平线折扣马尔可夫决策过程中的决策滚动和地平线滚动过程
3. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors [J] . Zhang Y. TOP: An Official Journal of the Spanish Society of Statistics and Operations Research . 2013,第2期

机译：具有非恒定折现因子的约束折现马尔科夫决策过程的凸解析方法
4. Discount and speed/execution tradeoffs in Markov Decision Process games [C] . Uribe Reinaldo, Lozano Fernando, Shibata Katsunari, 2011 IEEE Conference on Computational Intelligence and Games . 2011

机译：马尔可夫决策过程游戏中的折扣和速度/执行权衡
5. Improving the speed vs. accuracy tradeoff for simulating shared-memory multiprocessors with ILP processors. [D] . Durbhakula, Suryanarayana N. Murthy. 1998

机译：改进速度与精度之间的权衡，以使用ILP处理器模拟共享内存多处理器。
6. fMRI Evidence for a Dual Process Account of the Speed-Accuracy Tradeoff in Decision-Making [O] . Jason Ivanoff, Philip Branning, René Marois 2008

机译：fMRI证据表明决策过程中速度精度折衷的双重过程
7. Discount and speed/execution tradeoffs in Markov Decision Process Games. [O] . Reinaldo Uribe, O Lozano, Katsunari Shibata, 2012

机译：马尔可夫决策过程游戏中的折扣和速度/执行权衡。

Discount and speed/execution tradeoffs in Markov Decision Process games

摘要

著录项

相似文献

相关主题

期刊订阅