Evolutionary Policy Iteration for Solving Markov Decision Processes

Hyeong Soo Chang; Hong-Gi Lee; Michael C. Fu; Steven I. Marcus

首页> 外文期刊>IEEE Transactions on Automatic Control >Evolutionary Policy Iteration for Solving Markov Decision Processes

【24h】

Evolutionary Policy Iteration for Solving Markov Decision Processes

机译：解决马尔可夫决策过程的进化策略迭代

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel algorithm called evolutionary policy iteration (EPI) for solving infinite horizon discounted reward Markov decision processes. EPI inherits the spirit of policy iteration but eliminates the need to maximize over the entire action space in the policy improvement step, so it should be most effective for problems with very large action spaces. EPI iteratively generates a "population" or a set of policies such that the performance of the "elite policy" for a population monotonically improves with respect to a defined fitness function. EPI converges with probability one to a population whose elite policy is an optimal policy. EPI is naturally parallelizable and along this discussion, a distributed variant of PI is also studied.

机译：我们提出了一种新颖的算法，称为进化策略迭代（EPI），用于解决无限期折扣贴现马尔可夫决策过程。 EPI继承了策略迭代的精神，但是消除了在策略改进步骤中最大化整个操作空间的需要，因此它对于处理非常大的操作空间的问题应该是最有效的。 EPI迭代生成“人口”或一组策略，以使针对人群的“精英策略”的绩效相对于定义的适应度函数单调提高。 EPI有可能收敛到一个精英策略为最佳策略的人群。 EPI自然是可并行化的，在此讨论中，还研究了PI的分布式变体。

著录项

来源
《IEEE Transactions on Automatic Control》 |2005年第11期|p.1804-1808|共5页
作者
Hyeong Soo Chang; Hong-Gi Lee; Michael C. Fu; Steven I. Marcus;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
(Distributed) policy iteration; Evolutionary algorithm; Genetic algorithm; Markov decision process; Parallelization;

机译：（分布式）策略迭代;进化算法;遗传算法;马尔可夫决策过程;并行化;

相似文献

外文文献
中文文献
专利

1. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . Fern A., Givan R., Yoon S. The Journal of Artificial Intelligence Research . 2006,第12期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
2. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . A. Fern S. Yoon, R. Givan Journal of Automation, Mobile Robotics & Intelligent Systems . 2006,第5期

机译：具有策略语言偏差的近似策略迭代：解决关系马尔可夫决策过程
3. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes [J] . Alan Fern, Sungwook Yoon, Robert Givan The Journal of Artificial Intelligence Research . 2006,第0期

机译：具有策略语言偏差的近似策略迭代：解决关系Markov决策过程
4. Cosine Policy Iteration for Solving Infinite-Horizon Markov Decision Processes [C] . Juan Frausto-Solis, Elizabeth Santiago, Jaime Mora-Vargas MICAI 2009: Advances in artificial intelligence . 2009

机译：求解无限视野马尔可夫决策过程的余弦策略迭代
5. Acceleration of Iterative Methods for Markov Decision Processes. [D] . Shlakhter, Oleksandr. 2010

机译：马尔可夫决策过程的迭代方法的加速。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Evolutionary policy iteration for solving Markov decision processes [O] . Hyeong Soo Chang, Hong-gi Lee, Michael C. Fu, 2005

机译：求解Markov决策过程的进化策略迭代
8. Evolutionary Policy Iteration for Solving Markov Decision Processes [R] . Chang, H. S. , Lee, H. , Fu, M. , 2002

机译：求解马尔可夫决策过程的进化策略迭代

Evolutionary Policy Iteration for Solving Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅