...
首页> 外文期刊>高分子論文集 >Robust topological policy iteration for infinite horizon bounded Markov Decision Processes
【24h】

Robust topological policy iteration for infinite horizon bounded Markov Decision Processes

机译:无限地平线有界Markov决策过程的鲁棒拓扑策略迭代

获取原文
获取原文并翻译 | 示例
           

摘要

Markov Decision Processes (MDPS) are commonly used to solve sequential decision problems. A less restrictive model is the Bounded-parameter MDP (BMDP) that allows: (i) the transition function to be expressed in terms of probability intervals and (ii) reasoning about a robust solution, i.e., the best solution under the worst model. In this paper, we propose the Robust Topological Policy Iteration (RTPI) algorithm which is a new policy iteration algorithm for infinite horizon BMDPs based on a partition of the state space. The empirical results show that the more structured the domain, the better is the performance of RTPI. (C) 2018 Elsevier Inc. All rights reserved.
机译:马尔可夫决策过程(MDPS)通常用于解决顺序决策问题。限制参数较小的模型是有界参数MDP(BMDP),它允许:(i)用概率间隔表示过渡函数,以及(ii)推理可靠的解决方案,即最坏模型下的最佳解决方案。在本文中,我们提出了鲁棒拓扑策略迭代(RTPI)算法,它是一种基于状态空间分区的无限层BMDP的新策略迭代算法。实证结果表明,域越结构化,RTPI的性能越好。 (C)2018 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号