Real-time dynamic programming for Markov decision processes with imprecise probabilities

Karina V. Delgado; Leliane N. de Barros; Daniel B. Dias; Scott Sanner

首页> 外文期刊>Artificial intelligence >Real-time dynamic programming for Markov decision processes with imprecise probabilities

【24h】

Real-time dynamic programming for Markov decision processes with imprecise probabilities

机译：概率不精确的马尔可夫决策过程的实时动态编程

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-iP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.

机译：马尔可夫决策过程已成为概率计划的标准模型。但是，当应用于许多实际问题时，过渡概率的估计是不准确的。这可能是由于来自专家的冲突或状态转换信息不足所致。引入具有不精确过渡概率（MDP-IP）的马尔可夫决策过程，以获得在过渡过程中存在不确定性的稳健政策。尽管已经提出了一种用于MDP-IP的符号动态编程算法（称为SPUDD-IP），它可以解决多达22个状态变量的问题，但在实践中，解决MDP-IP问题非常耗时。在本文中，我们为更通用的MDP-IP类（称为随机最短路径MDP-IP（SSP MDP-IP））提出了有效的算法，该算法使用初始状态信息通过关注可达状态来解决复杂问题。提出了（L）RTDP-IP算法（一种用于SSP MDP-IP的（标签）实时动态编程算法），以及三种用于采样下一状态的方法。此处显示，可以通过使用这三种方法中的任何一种来获得（L）RTDP-iP的收敛性，尽管针对此类问题的Bellman备份规定了minimax优化。据我们所知，这是针对SSP MDP-IP的第一个异步算法，它是根据概率约束的一般集合给出的，该集合要求对Bellman备份中的不精确概率进行非线性优化。我们的结果表明，与SPUDD-IP算法相比，（L）RTDP-IP的速度提高了三个数量级。

著录项

来源
《Artificial intelligence》 |2016年第1期|192-223|共32页
作者
Karina V. Delgado; Leliane N. de Barros; Daniel B. Dias; Scott Sanner;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Probabilistic planning; Markov decision process; Robust planning;

机译：概率计划;马尔可夫决策过程;稳健的计划;
入库时间 2022-08-18 02:05:26

相似文献

外文文献
中文文献
专利

1. Modeling Markov Decision Processes with Imprecise Probabilities Using Probabilistic Logic Programming [J] . Thiago P. Bueno, Denis D. Mauá, Leliane N. Barros, JMLR: Workshop and Conference Proceedings . 2017,第3期

机译：使用概率逻辑程序设计不精确概率的马尔可夫决策过程
2. Using mathematical programming to solve Factored Markov Decision Processes with Imprecise Probabilities [J] . Karina Valdivia Delgado, Leliane Nunes de Barros, Fabio Gagliardi Cozman, International Journal of Approximate Reasoning . 2011,第7期

机译：使用数学程序求解具有不精确概率的因式马尔可夫决策过程
3. New approximate dynamic programming algorithms for large-scale undiscounted Markov decision processes and their application to optimize a production and distribution system [J] . Ohno Katsuhisa, Boh Toshitaka, Nakade Koichi, European Journal of Operational Research . 2016,第1期

机译：用于大规模无折扣马尔可夫决策过程的新的近似动态规划算法及其在优化生产和分销系统中的应用
4. Learning and Optimal Control of Imprecise Markov Decision Processes by Dynamic Programming Using the Imprecise Dirichlet Model [C] . Matthias C.M. Troffaes International Conference on Soft Methods in Probability and Statistics(SMPS'2004); 200405; Oviedo(ES) . 2004

机译：不精确Dirichlet模型的动态规划对不精确马尔可夫决策过程的学习和最优控制
5. Multistage decisions and risk in Markov decision processes: Towards effective approximate dynamic programming architectures. [D] . Pratikakis, Nikolaos E. 2009

机译：马尔可夫决策过程中的多阶段决策和风险：建立有效的近似动态编程体系结构。
6. Composition of Web Services Using Markov Decision Processes and Dynamic Programming [O] . Víctor Uc-Cetina, Francisco Moo-Mena, Rafael Hernandez-Ucan 2015

机译：使用Markov决策过程和动态规划的Web服务组合
7. Learning and Optimal Control of Imprecise Markov Decision Processes by Dynamic Programming Using the Imprecise Dirichlet Model [O] . Troffaes Matthias 2004

机译：基于不精确Dirichlet模型的动态规划学习与最优马尔可夫决策过程的最优控制

Real-time dynamic programming for Markov decision processes with imprecise probabilities

摘要

著录项

相似文献

相关主题

期刊订阅