首页> 外文会议>IEEE Symposium on Computational Intelligence and Games >Discount and speed/execution tradeoffs in Markov Decision Process games
【24h】

Discount and speed/execution tradeoffs in Markov Decision Process games

机译:马尔可夫决策过程游戏中的折扣和速度/执行权衡

获取原文

摘要

We study Markov Decision Process (MDP) games with the usual ±1 reinforcement signal. We consider the scenario in which the goal of the game, rather than just winning, is to maximize the number of wins in an allotted period of time (or maximize the expected reward in the same period). In the reinforcement learning literature, this type of tradeoff is often handled by tuning the discount parameter in order to encourage the learning algorithm to find policies that take fewer steps on average, at the cost of a lower probability of winning. We show that this approach is not guaranteed to solve the tradeoff problem optimally, and hence a different strategy is needed when tackling this type of problems.
机译:我们研究马尔可夫决策过程(MDP)游戏,具有通常的±1个加固信号。我们考虑游戏目标而不是赢得胜利的情景,是在分配的时间内最大化胜利数(或最大化同一时期的预期奖励)。在加强学习文献中,这种类型的权衡通常通过调整折扣参数来处理,以便鼓励学习算法找到平均较少的阶段的策略,以较低的获胜。我们表明这种方法不保证最佳地解决权衡问题,因此在解决这种问题时需要不同的策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号