首页> 外文学位 >Exploiting structure to efficiently solve large scale partially observable Markov decision processes.
【24h】

Exploiting structure to efficiently solve large scale partially observable Markov decision processes.

机译:利用结构有效地解决大规模可部分观测的马尔可夫决策过程。

获取原文
获取原文并翻译 | 示例

摘要

Partially observable Markov decision processes (POMDPs) provide a natural and principled framework to model a wide range of sequential decision making problems under uncertainty. To date, the use of POMDPs in real-world problems has been limited by the poor scalability of existing solution algorithms, which can only solve problems with up to ten thousand states. In fact, the complexity of finding an optimal policy for a finite-horizon discrete POMDP is PSPACE-complete. In practice, two important sources of intractability plague most solution algorithms: Large policy spaces and large state spaces.; On the other hand, for many real-world POMDPs it is possible to define effective policies with simple rules of thumb. This suggests that we may be able to find small policies that are near optimal. This thesis first presents a Bounded Policy Iteration (BPI) algorithm to robustly find a good policy represented by a small finite state controller. Real-world POMDPs also tend to exhibit structural properties that can be exploited to mitigate the effect of large state spaces. To that effect, a value-directed compression (VDC) technique is also presented to reduce POMDP models to lower dimensional representations.; In practice, it is critical to simultaneously mitigate the impact of complex policy representations and large state spaces. Hence, this thesis describes three approaches that combine techniques capable of dealing with each source of intractability: VDC with BPI, VDC with Perseus (a randomized point-based value iteration algorithm by Spaan and Vlassis [136]), and state abstraction with Perseus. The scalability of those approaches is demonstrated on two problems with more than 33 million states: synthetic network management and a real-world system designed to assist elderly persons with cognitive deficiencies to carry out simple daily tasks such as hand-washing. This represents an important step towards the deployment of POMDP techniques in ever larger, real-world, sequential decision making problems.
机译:部分可观察的马尔可夫决策过程(POMDP)提供了一个自然而有原则的框架,可以对不确定性下的一系列顺序决策问题进行建模。迄今为止,由于实际的解决方案算法的可伸缩性差,在实际问题中使用POMDP受到限制,该解决方案只能解决多达一万个状态的问题。实际上,为有限水平离散POMDP找到最优策略的复杂性是PSPACE完整的。实际上,难解性的两个重要来源困扰着大多数解决方案算法:大策略空间和大状态空间。另一方面,对于许多实际的POMDP,可以通过简单的经验法则来定义有效的策略。这表明我们也许能够找到接近最佳的小型政策。本文首先提出了一种边界策略迭代(BPI)算法,以鲁棒地找到一个由小型有限状态控制器代表的良好策略。现实世界中的POMDP也倾向于表现出可用于减轻大型状态空间影响的结构特性。为此,还提出了一种值定向压缩(VDC)技术,以将POMDP模型简化为较低的尺寸表示形式。实际上,同时减轻复杂策略表示和大型状态空间的影响至关重要。因此,本文描述了三种方法,这些方法结合了能够处理各种棘手问题的技术:使用BPI的VDC,使用Perseus的VDC(Spaan和Vlassis [136]的基于点的随机值迭代算法[136])和使用Perseus的状态抽象。这些方法的可扩展性在超过3,300万个州的两个问题上得到了证明:综合网络管理和旨在帮助具有认知缺陷的老年人执行日常日常任务(例如洗手)的现实系统。这代表了在越来越大的,现实的,顺序决策问题中部署POMDP技术的重要一步。

著录项

  • 作者

    Poupart, Pascal.;

  • 作者单位

    University of Toronto (Canada).;

  • 授予单位 University of Toronto (Canada).;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 144 p.
  • 总页数 144
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:41:30

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号