Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

Martijn Onderwater; Sandjai Bhulai; Rob van der Mei

首页> 外文期刊>Performance evaluation review >Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

【24h】

Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

机译：通过价值函数发现学习马尔可夫决策过程中的最优策略

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we describe recent progress in our work on Value Function Discovery (vfd), a novel method for discovery of value functions for Markov Decision Processes (mdps). In a previous paper we described how vfd discovers algebraic descriptions of value functions (and the corresponding policies) using ideas from the Evolutionary Algorithm field. A special feature of vfd is that the descriptions include the model parameters of the mdp. We extend that work and show how additional information about the structure of the mdp can be included in vfd. This alternative use of vfd still yields near-optimal policies, and is much faster. Besides increased performance and improved run times, this approach illustrates that vfd is not restricted to learning value functions and can be applied more generally.

机译：在本文中，我们描述了价值函数发现（vfd）工作的最新进展，这是一种用于发现马尔可夫决策过程（mdps）的价值函数的新方法。在上一篇文章中，我们描述了vfd如何使用“进化算法”领域的思想发现价值函数（以及相应的策略）的代数描述。 vfd的一个特殊功能是描述包含mdp的模型参数。我们将扩展这项工作，并说明如何在vfd中包含有关mdp结构的其他信息。 vfd的这种替代用法仍可产生接近最佳的策略，并且速度更快。除了提高性能和缩短运行时间外，该方法还说明vfd不仅限于学习价值函数，而且可以更广泛地应用。

著录项

来源
《Performance evaluation review》 |2015年第2期|7-9|共3页
作者
Martijn Onderwater; Sandjai Bhulai; Rob van der Mei;
展开▼
作者单位

CWI Stochastics Group Science Park 123 1098XG, Amsterdam The Netherlands,VU University Amsterdam, Faculty of Sciences De Boelelaan 1081a 1081HV, Amsterdam The Netherlands;

VU University Amsterdam, Faculty of Sciences De Boelelaan 1081a 1081HV, Amsterdam The Netherlands;

CWI Stochastics Group Science Park 123 1098XG, Amsterdam The Netherlands,VU University Amsterdam, Faculty of Sciences De Boelelaan 1081a 1081HV, Amsterdam The Netherlands;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Markov Decision Processes; Evolutionary Algorithms; Value Function; Genetic Programming;

机译：马尔可夫决策过程;进化算法;值函数;遗传程序设计;

相似文献

外文文献
中文文献
专利

1. On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies [J] . Feinberg Eugene A., Lewis Mark E. Naval Research Logistics . 2018,第8期

机译：马尔可夫决策过程最优动作的收敛性与（s，S）库存策略的最优性
2. Properties of the optimality equation and optimal policies in discrete time Markov decision processes [J] . Qiying Hu, Wuyi Yue 電子情報通信学会技術研究報告. 回路とシステム. Circuits and Systems . 2002,第427期

机译：离散时间马尔可夫决策过程中最优方程和最优策略的性质
3. Properties of the optimality equation and optimal policies in discrete time Markov decision processes [J] . Qiying Hu, Wuyi Yue 電子情報通信学会技術研究報告. コンカレント工学. Concurrent System Technology . 2002,第429期

机译：离散时间马尔可夫决策过程中最优方程和最优策略的性质
4. Sufficiency of Markov policies for continuous-time Markov decision processes and solutions to Kolmogorov's forward equation for jump Markov processes [C] . Feinberg E.A., Mandava M., Shiryaev A.N. IEEE Annual Conference on Decision and Control . 2013

机译：连续时间马尔可夫决策过程的马尔可夫策略的充分性以及跳跃马尔可夫过程的Kolmogorov正方程的解
5. Optimal learning: Computational procedures for Bayes-adaptive Markov decision processes. [D] . Duff, Michael O'Gordon. 2002

机译：最佳学习：贝叶斯自适应马尔可夫决策过程的计算程序。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Learning Optimal Policies in Markov Decision Processes with Value Function Discovery [O] . Onderwater, Martijn, Bhulai, Sandjai, Mei, Rob 2015

机译：通过价值函数发现学习马尔可夫决策过程中的最优策略

Learning Optimal Policies in Markov Decision Processes with Value Function Discovery

摘要

著录项

相似文献

相关主题

期刊订阅