首页> 外文期刊>Artificial intelligence >Real-time dynamic programming for Markov decision processes with imprecise probabilities
【24h】

Real-time dynamic programming for Markov decision processes with imprecise probabilities

机译:概率不精确的马尔可夫决策过程的实时动态编程

获取原文
获取原文并翻译 | 示例
       

摘要

Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-iP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.
机译:马尔可夫决策过程已成为概率计划的标准模型。但是,当应用于许多实际问题时,过渡概率的估计是不准确的。这可能是由于来自专家的冲突或状态转换信息不足所致。引入具有不精确过渡概率(MDP-IP)的马尔可夫决策过程,以获得在过渡过程中存在不确定性的稳健政策。尽管已经提出了一种用于MDP-IP的符号动态编程算法(称为SPUDD-IP),它可以解决多达22个状态变量的问题,但在实践中,解决MDP-IP问题非常耗时。在本文中,我们为更通用的MDP-IP类(称为随机最短路径MDP-IP(SSP MDP-IP))提出了有效的算法,该算法使用初始状态信息通过关注可达状态来解决复杂问题。提出了(L)RTDP-IP算法(一种用于SSP MDP-IP的(标签)实时动态编程算法),以及三种用于采样下一状态的方法。此处显示,可以通过使用这三种方法中的任何一种来获得(L)RTDP-iP的收敛性,尽管针对此类问题的Bellman备份规定了minimax优化。据我们所知,这是针对SSP MDP-IP的第一个异步算法,它是根据概率约束的一般集合给出的,该集合要求对Bellman备份中的不精确概率进行非线性优化。我们的结果表明,与SPUDD-IP算法相比,(L)RTDP-IP的速度提高了三个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号