首页> 外文会议>International conference on quantitative evaluation of systems >Policy Learning for Time-Bounded Reachability in Continuous-Time Markov Decision Processes via Doubly-Stochastic Gradient Ascent
【24h】

Policy Learning for Time-Bounded Reachability in Continuous-Time Markov Decision Processes via Doubly-Stochastic Gradient Ascent

机译:基于双随机梯度上升的连续时间马尔可夫决策过程中时间可及性的策略学习

获取原文
获取外文期刊封面目录资料

摘要

Continuous-time Markov decision processes are an important class of models in a wide range of applications, ranging from cyber-physical systems to synthetic biology. A central problem is how to devise a policy to control the system in order to maximise the probability of satisfying a set of temporal logic specifications. Here we present a novel approach based on statistical model checking and an unbiased estimation of a functional gradient in the space of possible policies. The statistical approach has several advantages over conventional approaches based on uniformisation, as it can also be applied when the model is replaced by a black box, and does not suffer from state-space explosion. The use of a stochastic gradient to guide our search considerably improves the efficiency of learning policies. We demonstrate the method on a proof-of-principle non-linear population model, showing strong performance in a non-trivial task.
机译:连续时间的马尔可夫决策过程是从网络物理系统到合成生物学的广泛应用中的一类重要模型。中心问题是如何设计一种策略来控制系统,以使满足一组时间逻辑规范的可能性最大化。在这里,我们提出了一种基于统计模型检查和在可能的策略空间中功能梯度的无偏估计的新颖方法。统计方法与基于均匀化的常规方法相比具有多个优点,因为当模型被黑匣子替换时,统计方法也可以应用,并且不会遭受状态空间爆炸的影响。使用随机梯度来指导我们的搜索可以大大提高学习策略的效率。我们在原则证明的非线性总体模型上演示了该方法,在非平凡的任务中显示出强大的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号