Tableaux for Policy Synthesis for MDPs with PCTL* Constraints

机译：具有PCTL *约束的MDP策略综合的Tableaux

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Markov decision processes (MDPs) are the standard formalism for modelling sequential decision making in stochastic environments. Policy synthesis addresses the problem of how to control or limit the decisions an agent makes so that a given specification is met. In this paper we consider PCTL*, the probabilistic counterpart of CTL*, as the specification language. Because in general the policy synthesis problem for PCTL* is undecidable, we restrict to policies whose execution history memory is finitely bounded a priori. Surprisingly, no algorithm for policy synthesis for this natural and expressive framework has been developed so far. We close this gap and describe a tableau-based algorithm that, given an MDP and a PCTL* specification, derives in a non-deterministic way a system of (possibly nonlinear) equalities and inequalities. The solutions of this system, if any, describe the desired (stochastic) policies. Our main result in this paper is the correctness of our method, i.e., soundness, completeness and termination.

机译：马尔可夫决策过程（MDP）是在随机环境中对顺序决策建模的标准形式主义。策略综合解决了如何控制或限制代理做出的决定以便满足给定规范的问题。在本文中，我们将PCTL *（CTL *的概率对应物）视为规范语言。由于通常无法确定PCTL *的策略综合问题，因此我们将其执行历史内存限制为先验限制的策略。令人惊讶的是，到目前为止，尚未开发出用于此自然表达框架的策略综合算法。我们缩小了这一差距，并描述了一种基于表格的算法，该算法在给定MDP和PCTL *规范的情况下，以不确定性的方式推导了（可能是非线性的）等式和不等式的系统。该系统的解决方案（如果有）描述了所需的（随机）策略。本文的主要结果是方法的正确性，即稳健性，完整性和终止性。

著录项

来源
《International Conference on Automated Reasoning with Analytic Tableaux and Related Methods》|2017年|175-192|共18页
会议地点 Brasilia(BR)
作者
Peter Baumgartner; Sylvie Thiebaux; Felipe Trevizan;
展开▼
作者单位

Data61/CSIRO and Research School of Computer Science ANU Canberra Australia;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning Algorithms for Discounted MDPs with Constraints [J] . Peter Geibel, Fritz Wysotzki International journal of mathematics, game theory and algebra . 2012,第2a3期

机译：带约束的折扣MDP的学习算法
2. Solving efficiently Decentralized MDPs with temporal and resource constraints [J] . Aurelie Beynier, Abdel-Illah Mouaddib Autonomous agents and multi-agent systems . 2011,第3期

机译：有效解决具有时间和资源限制的分散MDP
3. Synthesis and application of MDPE-g-GMA as reactive compatibilizer in blends of MDPE/PET and MDPE/PA6 [J] . Daneshvar M., Masoomi M. Journal of Applied Polymer Science . 2012,第3期

机译：MDPE-g-GMA作为反应性增容剂在MDPE / PET和MDPE / PA6混合物中的合成及应用
4. Tableaux for Policy Synthesis for MDPs with PCTL~* Constraints [C] . Peter Baumgartner, Sylvie Thiebaux, Felipe Trevizan International Conference on Automated Reasoning with Analytic Tableaux and Related Methods . 2017

机译：用于PCTL〜*约束的MDP的政策综合的表格
5. Lip Synchronization for ECA Rendering with Self-Adjusted POMDP Policies [D] . Szucs, Tristan. 2019

机译：ECA渲染与自我调整POMDP政策的唇部同步
6. MDPs with Non-Deterministic Policies [O] . Mahdi Milani Fard, Joelle Pineau -1

机译：具有不确定性策略的MDP
7. Tableaux for Policy Synthesis for MDPs with PCTL* Constraints [O] . Baumgartner, Peter, Thiébaux, Sylvie, Trevizan, Felipe 2017

机译：具有pCTL *约束的mDp的政策综合表格
8. Polynomial-Time Verification of PCTL Properties of MDPs with Convex Uncertainties. [R] . Puggelli, A. A., Li, W., Sangiovanni-Vincentelli, A. L., 2013

机译：具有凸不确定性的mDp的pCTL性质的多项式时间验证。

Tableaux for Policy Synthesis for MDPs with PCTL* Constraints

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅