首页> 外文期刊>INFORMS journal on computing >An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes
【24h】

An Evolutionary Random Policy Search Algorithm for Solving Markov Decision Processes

机译:一种求解马尔可夫决策过程的进化随机策略搜索算法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a new randomized search method called evolutionary random policy search (ERPS) for solving infinite-horizon discounted-cost Markov-decision-process (MDP) problems. The algorithm is particularly targeted at problems with large or uncountable action spaces. ERPS approaches a given MDP by iteratively dividing it into a sequence of smaller, random, sub-MDP problems based on information obtained from random sampling of the entire action space and local search. Each sub-MDP is then solved approximately by using a variant of the standard policy-improvement technique, where an elite policy is obtained. We show that the sequence of elite policies converges to an optimal policy with probability one. Some numerical studies are carried out to illustrate the algorithm and compare it with existing procedures.
机译:本文提出了一种新的随机搜索方法,称为进化随机策略搜索(ERPS),用于解决无限水平的折扣成本马尔可夫决策过程(MDP)问题。该算法特别针对具有较大或不可数动作空间的问题。 ERPS通过根据从整个动作空间的随机采样和局部搜索获得的信息,将给定的MDP迭代地划分为一系列较小的,随机的,子MDP问题,从而处理给定的MDP。然后,通过使用标准策略改进技术的一种变体来近似解决每个子MDP,在这种变体中可以获得精英策略。我们表明,精英策略的序列收敛到概率为1的最优策略。进行了一些数值研究,以说明该算法并将其与现有程序进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号