...
首页> 外文期刊>IEEE Transactions on Automatic Control >Evolutionary Policy Iteration for Solving Markov Decision Processes
【24h】

Evolutionary Policy Iteration for Solving Markov Decision Processes

机译:解决马尔可夫决策过程的进化策略迭代

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a novel algorithm called evolutionary policy iteration (EPI) for solving infinite horizon discounted reward Markov decision processes. EPI inherits the spirit of policy iteration but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population monotonically improves with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.
机译:我们提出了一种新颖的算法,称为进化策略迭代(EPI),用于解决无限期折扣贴现马尔可夫决策过程。 EPI继承了策略迭代的精神,但是消除了在策略改进步骤中最大化整个操作空间的需要,因此它对于处理非常大的操作空间的问题应该是最有效的。 EPI迭代生成“人口”或一组策略,以使针对人群的“精英策略”的绩效相对于定义的适应度函数单调提高。 EPI有可能收敛到一个精英策略为最佳策略的人群。 EPI自然是可并行化的,在此讨论中,还研究了PI的分布式变体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号