首页> 外文学位 >Exploiting structure to efficiently solve large scale partially observable Markov decision processes.

【24h】

Exploiting structure to efficiently solve large scale partially observable Markov decision processes.

机译：利用结构有效地解决大规模可部分观测的马尔可夫决策过程。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: Large policy spaces and large state spaces.; On the other hand, for many real-world POMDPs it is possible to define effective policies with simple rules of thumb. This suggests that we may be able to find small policies that are near optimal. This thesis first presents a Bounded Policy Iteration (BPI) algorithm to robustly find a good policy represented by a small finite state controller. Real-world POMDPs also tend to exhibit structural properties that can be exploited to mitigate the effect of large state spaces. To that effect, a value-directed compression (VDC) technique is also presented to reduce POMDP models to lower dimensional representations.; In practice, it is critical to simultaneously mitigate the impact of complex policy representations and large state spaces. Hence, this thesis describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDC with Perseus (a randomized point-based value iteration algorithm by Spaan and Vlassis [136]), and state abstraction with Perseus. The scalability of those approaches is demonstrated on two problems with more than 33 million states: synthetic network management and a real-world system designed to assist elderly persons with cognitive deficiencies to carry out simple daily tasks such as hand-washing. This represents an important step towards the deployment of POMDP techniques in ever larger, real-world, sequential decision making problems.

机译：部分可观察的马尔可夫决策过程（POMDP）提供了一个自然而有原则的框架，可以对不确定性下的一系列顺序决策问题进行建模。迄今为止，由于实际的解决方案算法的可伸缩性差，在实际问题中使用POMDP受到限制，该解决方案只能解决多达一万个状态的问题。实际上，为有限水平离散POMDP找到最优策略的复杂性是PSPACE完整的。实际上，难解性的两个重要来源困扰着大多数解决方案算法：大策略空间和大状态空间。另一方面，对于许多实际的POMDP，可以通过简单的经验法则来定义有效的策略。这表明我们也许能够找到接近最佳的小型政策。本文首先提出了一种边界策略迭代（BPI）算法，以鲁棒地找到一个由小型有限状态控制器代表的良好策略。现实世界中的POMDP也倾向于表现出可用于减轻大型状态空间影响的结构特性。为此，还提出了一种值定向压缩（VDC）技术，以将POMDP模型简化为较低的尺寸表示形式。实际上，同时减轻复杂策略表示和大型状态空间的影响至关重要。因此，本文描述了三种方法，这些方法结合了能够处理各种棘手问题的技术：使用BPI的VDC，使用Perseus的VDC（Spaan和Vlassis [136]的基于点的随机值迭代算法[136]）和使用Perseus的状态抽象。这些方法的可扩展性在超过3,300万个州的两个问题上得到了证明：综合网络管理和旨在帮助具有认知缺陷的老年人执行日常日常任务（例如洗手）的现实系统。这代表了在越来越大的，现实的，顺序决策问题中部署POMDP技术的重要一步。

著录项

作者
Poupart, Pascal.;
展开▼
作者单位

University of Toronto (Canada).;

展开▼
授予单位 University of Toronto (Canada).;
学科 Computer Science.
学位 Ph.D.
年度 2005
页码 144 p.
总页数 144
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
入库时间 2022-08-17 11:41:30

相似文献

外文文献
中文文献
专利

1. Partially observable Markov decision process-based MAC-layer sensing optimisation for cognitive radios exploiting rateless-coded spectrum aggregation [J] . Wang X., Chen W., Cao Z. Communications, IET . 2012,第8期

机译：基于无速率编码频谱聚合的认知无线电的基于部分可观察的马尔可夫决策过程的MAC层感知优化
2. Optimum inspection and maintenance policies for corroded structures using partially observable Markov decision processes and stochastic, physically based models [J] . K.G. Papakonstantinou, M. Shinozuka Probabilistic engineering mechanics . 2014,第jula期

机译：使用部分可观察的马尔可夫决策过程和基于物理的随机模型对腐蚀结构进行最佳检查和维护，
3. The Optimal Observability of Partially Observable Markov Decision Processes: Discrete State Space [J] . Rezaeian M.Vo B.-N.Evans J. S. Automatic Control, IEEE Transactions on . 2010,第12期

机译：部分可观马尔可夫决策过程的最优可观性：离散状态空间
4. Hidden Markov Model Estimation-Based Q-learning for Partially Observable Markov Decision Process [C] . Hyung-Jin Yoon, Donghwan Lee, Naira Hovakimyan Annual American Control Conference . 2019

机译：基于隐马尔可夫模型估计的部分可观马尔可夫决策过程的Q学习
5. Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments. [D] . Amato, Christopher. 2010

机译：用于集中式和分散式部分可观察的马尔可夫决策过程的算法中的可伸缩性不断增强：在不确定的环境中进行有效的决策和协调。
6. Modeling treatment of ischemic heart disease with partially observable Markov decision processes. [O] . M. Hauskrecht, H. Fraser 1998

机译：使用局部可观察的马尔可夫决策过程对缺血性心脏病的治疗进行建模。
7. On The Average Cost Optimality Equation And The Structure Of Optimal Policies For Partially Observable Markov Decision Processes [O] . Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, Steven I. Marcus 2007

机译：部分可观察的马尔可夫决策过程的平均成本最优方程和最优策略的结构

Exploiting structure to efficiently solve large scale partially observable Markov decision processes.

摘要

著录项

相似文献

相关主题

期刊订阅