首页> 外文会议>Annual American Control Conference >Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition
【24h】

Compositional planning in Markov decision processes: Temporal abstraction meets generalized logic composition

机译:马尔可夫决策过程中的组合计划:时间抽象符合广义逻辑组合

获取原文

摘要

In hierarchical planning for Markov decision processes (MDPs), temporal abstraction allows planning with macro-actions that take place at different time scale in form of sequential composition. In this paper, we propose a novel approach to compositional reasoning and hierarchical planning for MDPs under co-safe temporal logic constraints. In addition to sequential composition, we introduce a composition of policies based on generalized logic composition: Given sub-policies for sub-tasks and a new task expressed as logic compositions of subtasks, a semi-optimal policy, which is optimal in planning with only sub-policies, can be obtained by simply composing sub-polices. Thus, a synthesis algorithm is developed to compute optimal policies efficiently by planning with primitive actions, policies for sub-tasks, and the compositions of sub-policies, for maximizing the probability of satisfying constraints specified in the fragment of co-safe temporal logic. We demonstrate the correctness and efficiency of the proposed method in stochastic planning examples with a single agent and multiple task specifications.
机译:在用于马尔可夫决策过程(MDP)的分层计划中,时间抽象允许进行具有宏观动作的计划,这些动作在不同的时间尺度上以顺序组合的形式发生。在本文中,我们提出了一种在共同安全的时间逻辑约束下对MDP进行组成推理和层次规划的新方法。除了顺序组合之外,我们还介绍了基于广义逻辑组合的策略组合:给定子任务的子策略和表示为子任务逻辑组合的新任务,这是一种半最优策略,在仅进行规划的情况下是最佳的子策略可以通过简单地组成子策略来获得。因此,开发了一种综合算法,可以通过使用原始动作,子任务策略以及子策略的组成进行规划来有效地计算最佳策略,以最大程度地满足在共同安全时间逻辑的片段中指定的约束的概率。我们在具有单个代理和多个任务规范的随机计划示例中证明了该方法的正确性和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号