首页> 外文OA文献 >Policy learning in continuous-time Markov decision processes using Gaussian Processes
【2h】

Policy learning in continuous-time Markov decision processes using Gaussian Processes

机译:使用高斯进程的连续时间马尔可夫决策过程中的政策学习

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Continuous-time Markov decision processes provide a very powerful mathematical framework to solve policy-making problems in a wide range of applications, ranging from the control of populations to cyber–physical systems. The key problem to solve for these models is to efficiently compute an optimal policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we introduce a novel method based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. Our approach presents several advantages over the classical methods based on discretisation techniques, as it does not assume the a-priori knowledge of a model that can be replaced by a black-box, and does not suffer from state-space explosion. The use of a stochastic moment-based gradient ascent algorithm to guide our search considerably improves the efficiency of learning policies and accelerates the convergence using the momentum term. We demonstrate the strong performance of our approach on two examples of non-linear population models: an epidemiology model with no permanent recovery and a queuing system with non-deterministic choice.
机译:连续时间马尔可夫决策过程提供了一个非常强大的数学框架,以解决广泛的应用中的政策问题,从控制人口到网络物理系统。解决这些模型的关键问题是有效地计算最佳策略来控制系统,以最大化满足一组时间逻辑规范的概率。在这里,我们介绍了一种基于统计模型检查的新方法,并在可能策略的空间中的功能梯度的无偏见估计。我们的方法基于自分离心技术呈现了古典方法的几个优点,因为它不认为可以用黑盒替换的模型的a-priori知道,并且不会遭受状态空间爆炸。使用随机时刻的梯度上升算法来指导我们的搜索显着提高了学习政策的效率,并使用动量术语加速了收敛。我们展示了我们对非线性人口模型的两个例子的强劲表现:流行病学模型,没有永久性恢复和具有非确定性选择的排队系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号