Discount and speed/execution tradeoffs in Markov Decision Process games

机译：马尔可夫决策过程游戏中的折扣和速度/执行权衡

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We study Markov Decision Process (MDP) games with the usual ±1 reinforcement signal. We consider the scenario in which the goal of the game, rather than just winning, is to maximize the number of wins in an allotted period of time (or maximize the expected reward in the same period). In the reinforcement learning literature, this type of tradeoff is often handled by tuning the discount parameter in order to encourage the learning algorithm to find policies that take fewer steps on average, at the cost of a lower probability of winning. We show that this approach is not guaranteed to solve the tradeoff problem optimally, and hence a different strategy is needed when tackling this type of problems.

机译：我们研究具有通常±1增强信号的Markov决策过程（MDP）游戏。我们考虑的场景是，游戏的目标（而不只是获胜）是在分配的时间段内最大化获胜次数（或在同一时期内最大化预期收益）。在强化学习文献中，通常通过调整折扣参数来处理这种折衷，以鼓励学习算法找到平均采取较少步骤的策略，但以降低获胜概率为代价。我们表明，这种方法不能保证最优地解决折衷问题，因此在解决此类问题时需要采用不同的策略。

著录项

来源
《2011 IEEE Conference on Computational Intelligence and Games》|2011年|p.79-86|共8页
会议地点
作者
Uribe Reinaldo; Lozano Fernando; Shibata Katsunari; Anderson Charles;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;电子游戏机;
关键词

相似文献

外文文献
中文文献
专利

1. On discounted approximations of undiscounted stochastic games and Markov decision processes with limited randomness [J] . Boros E., Elbassioni K., Gurvich V., Operations Research Letters: A Journal of the Operations Research Society of America . 2013,第4期

机译：有限随机性下无折扣随机博弈的折现近似和马尔可夫决策过程
2. Decision Roll and Horizon Roll Processes in Infinite Horizon Discounted Markov Decision Processes [J] . D. J. White Management science: Journal of the Institute of Management Sciences . 1996,第1期

机译：无限地平线折扣马尔可夫决策过程中的决策滚动和地平线滚动过程
3. Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factors [J] . Zhang Y. TOP: An Official Journal of the Spanish Society of Statistics and Operations Research . 2013,第2期

机译：具有非恒定折现因子的约束折现马尔科夫决策过程的凸解析方法
4. Discount and speed/execution tradeoffs in Markov Decision Process games [C] . Uribe Reinaldo, Lozano Fernando, Shibata Katsunari, IEEE Symposium on Computational Intelligence and Games . 2011

机译：马尔可夫决策过程游戏中的折扣和速度/执行权衡
5. Improving the speed vs. accuracy tradeoff for simulating shared-memory multiprocessors with ILP processors. [D] . Durbhakula, Suryanarayana N. Murthy. 1998

机译：改进速度与精度之间的权衡，以使用ILP处理器模拟共享内存多处理器。
6. fMRI Evidence for a Dual Process Account of the Speed-Accuracy Tradeoff in Decision-Making [O] . Jason Ivanoff, Philip Branning, René Marois 2008

机译：fMRI证据表明决策过程中速度精度折衷的双重过程
7. Discount and speed/execution tradeoffs in Markov Decision Process Games. [O] . Reinaldo Uribe, O Lozano, Katsunari Shibata, 2012

机译：马尔可夫决策过程游戏中的折扣和速度/执行权衡。

Discount and speed/execution tradeoffs in Markov Decision Process games

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅