【24h】

Tableaux for Policy Synthesis for MDPs with PCTL* Constraints

机译:具有PCTL *约束的MDP策略综合的Tableaux

获取原文

摘要

Markov decision processes (MDPs) are the standard formalism for modelling sequential decision making in stochastic environments. Policy synthesis addresses the problem of how to control or limit the decisions an agent makes so that a given specification is met. In this paper we consider PCTL*, the probabilistic counterpart of CTL*, as the specification language. Because in general the policy synthesis problem for PCTL* is undecidable, we restrict to policies whose execution history memory is finitely bounded a priori. Surprisingly, no algorithm for policy synthesis for this natural and expressive framework has been developed so far. We close this gap and describe a tableau-based algorithm that, given an MDP and a PCTL* specification, derives in a non-deterministic way a system of (possibly nonlinear) equalities and inequalities. The solutions of this system, if any, describe the desired (stochastic) policies. Our main result in this paper is the correctness of our method, i.e., soundness, completeness and termination.
机译:马尔可夫决策过程(MDP)是在随机环境中对顺序决策建模的标准形式主义。策略综合解决了如何控制或限制代理做出的决定以便满足给定规范的问题。在本文中,我们将PCTL *(CTL *的概率对应物)视为规范语言。由于通常无法确定PCTL *的策略综合问题,因此我们将其执行历史内存限制为先验限制的策略。令人惊讶的是,到目前为止,尚未开发出用于此自然表达框架的策略综合算法。我们缩小了这一差距,并描述了一种基于表格的算法,该算法在给定MDP和PCTL *规范的情况下,以不确定性的方式推导了(可能是非线性的)等式和不等式的系统。该系统的解决方案(如果有)描述了所需的(随机)策略。本文的主要结果是方法的正确性,即稳健性,完整性和终止性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号