首页> 外文OA文献 >Policy learning in continuous-time Markov decision processes using Gaussian Processes

【2h】

Policy learning in continuous-time Markov decision processes using Gaussian Processes

机译：使用高斯进程的连续时间马尔可夫决策过程中的政策学习

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Continuous-time Markov decision processes provide a very powerful mathematical framework to solve policy-making problems in a wide range of applications, ranging from the control of populations to cyber–physical systems. The key problem to solve for these models is to efficiently compute an optimal policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we introduce a novel method based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. Our approach presents several advantages over the classical methods based on discretisation techniques, as it does not assume the a-priori knowledge of a model that can be replaced by a black-box, and does not suffer from state-space explosion. The use of a stochastic moment-based gradient ascent algorithm to guide our search considerably improves the efficiency of learning policies and accelerates the convergence using the momentum term. We demonstrate the strong performance of our approach on two examples of non-linear population models: an epidemiology model with no permanent recovery and a queuing system with non-deterministic choice.

机译：连续时间马尔可夫决策过程提供了一个非常强大的数学框架，以解决广泛的应用中的政策问题，从控制人口到网络物理系统。解决这些模型的关键问题是有效地计算最佳策略来控制系统，以最大化满足一组时间逻辑规范的概率。在这里，我们介绍了一种基于统计模型检查的新方法，并在可能策略的空间中的功能梯度的无偏见估计。我们的方法基于自分离心技术呈现了古典方法的几个优点，因为它不认为可以用黑盒替换的模型的a-priori知道，并且不会遭受状态空间爆炸。使用随机时刻的梯度上升算法来指导我们的搜索显着提高了学习政策的效率，并使用动量术语加速了收敛。我们展示了我们对非线性人口模型的两个例子的强劲表现：流行病学模型，没有永久性恢复和具有非确定性选择的排队系统。

著录项

作者
Ezio Bartocci; Luca Bortolussi; Tomáš Brázdil; Dimitrios Milios; Guido Sanguinetti;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. Policy learning in continuous-time Markov decision processes using Gaussian Processes [J] . Bartocci Ezio, Bortolussi Luca, Brazdil Tomas, Performance Evaluation . 2017,第nova期

机译：使用高斯过程的连续时间马尔可夫决策过程中的策略学习
2. Optimality of Mixed Policies for Average Continuous-Time Markov Decision Processes with Constraints [J] . Guo Xianping, Zhang Yi Mathematics of operations research . 2016,第4期

机译：约束条件下平均连续时间马尔可夫决策过程混合策略的最优性
3. First Passage Optimality for Continuous-Time Markov Decision Processes With Varying Discount Factors and History-Dependent Policies [J] . Guo X., Song X., Zhang Y. IEEE Transactions on Automatic Control . 2014,第1期

机译：可变折扣因子和历史相关策略的连续时间马尔可夫决策过程的第一遍最优性
4. Sufficiency of Markov policies for continuous-time Markov decision processes and solutions to Kolmogorov's forward equation for jump Markov processes [C] . Feinberg E.A., Mandava M., Shiryaev A.N. IEEE Annual Conference on Decision and Control . 2013

机译：连续时间马尔可夫决策过程的马尔可夫策略的充分性以及跳跃马尔可夫过程的Kolmogorov正方程的解
5. Finite memory policies for partially observable Markov decision processes. [D] . Lusena, Christopher David. 2001

机译：用于部分可观察的马尔可夫决策过程的有限内存策略。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Policy learning for time-bounded reachability in continuous-time Markov decision processes via doubly-stochastic gradient ascent [O] . Bartocci Ezio, Bortolussi Luca, Brázdil Tomǎš, 2016

机译：通过双随机梯度上升进行连续时间马尔可夫决策过程中时间可及性的策略学习

Policy learning in continuous-time Markov decision processes using Gaussian Processes

摘要

著录项

相似文献

相关主题

期刊订阅