首页> 外文会议>2011 IEEE Conference on Computational Intelligence and Games >Discount and speed/execution tradeoffs in Markov Decision Process games
【24h】

Discount and speed/execution tradeoffs in Markov Decision Process games

机译:马尔可夫决策过程游戏中的折扣和速度/执行权衡

获取原文
获取外文期刊封面目录资料

摘要

We study Markov Decision Process (MDP) games with the usual ±1 reinforcement signal. We consider the scenario in which the goal of the game, rather than just winning, is to maximize the number of wins in an allotted period of time (or maximize the expected reward in the same period). In the reinforcement learning literature, this type of tradeoff is often handled by tuning the discount parameter in order to encourage the learning algorithm to find policies that take fewer steps on average, at the cost of a lower probability of winning. We show that this approach is not guaranteed to solve the tradeoff problem optimally, and hence a different strategy is needed when tackling this type of problems.
机译:我们研究具有通常±1增强信号的Markov决策过程(MDP)游戏。我们考虑的场景是,游戏的目标(而不只是获胜)是在分配的时间段内最大化获胜次数(或在同一时期内最大化预期收益)。在强化学习文献中,通常通过调整折扣参数来处理这种折衷,以鼓励学习算法找到平均采取较少步骤的策略,但以降低获胜概率为代价。我们表明,这种方法不能保证最优地解决折衷问题,因此在解决此类问题时需要采用不同的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号