首页> 外文学位 >Model learning and application of partially observable Markov decision processes.
【24h】

Model learning and application of partially observable Markov decision processes.

机译:部分可观察的马尔可夫决策过程的模型学习和应用。

获取原文
获取原文并翻译 | 示例

摘要

The partially observable Markov decision process (POMDP) has been widely used in robot navigation and decision making. Learning an accurate POMDP model is a prerequisite for model-based POMDP applications. Given the definitions of states, actions and observations, learning a POMDP model concerns inferring the state-transition probabilities and state-dependent observation probabilities. This dissertation presents three Bayesian methods for learning a POMDP model, based on the MEDUSA (Markovian exploration with decision based on the use of sampled models algorithm) learning, multi-task learning (MTL) and life-long learning (LLL). These learning algorithms are introduced within two POMDP applications: adaptive land- mine sensing and online target searching.;After presenting background material on POMDPs, MEDUSA, MTL and LLL in Chapter 1, Chapter 2 addresses the application of multimodality sensing of landmines using two sensors. We first assume adequate data for learning an underlying POMDP model of mines and clutter are available, and describe the method of building an appropriate model. This is generalized by assuming the training data for mines and clutter are not available a priori, and the underling POMDP model is learned online based on a modified MEDUSA approach. An oracle is employed adaptively to reveal the label information of the underground mines/clutter under interrogation, and the posterior of the underlying POMDP model is updated based on the interrogation result. Example results are presented using measured sensing data from two sensors for buried mines and clutter, to demonstrate the performance of the algorithm.;Chapters 3-5 address the application of online target searching in an unknown environment. POMDPs and a simultaneous localization and mapping (SLAM) algorithm are combined in this application to navigate a robot (searching for an acoustic source) arid to build a global map simultaneously. Chapter 3 introduces the SLAM algorithm, and proposes a geometric map representation by which a map is represented by a set of geometric units. Chapter 4 presents the online target searching framework, based on a modified MEDUSA and a grid-based SLAM, under the assumption of the availability of all the possible subworlds that may be encountered. An accurate POMDP model for each possible subworld is built before searching. The modified MEDUSA is performed for each subworld during the searching process, to recognize a correct underlying model. The assumption of knowing all the possible subworlds a priori is removed in Chapter 5, where two transfer learning approaches, multi-task learning and life-long learning, are proposed for learning a POMDP model, when the training data from a single task are not sufficient. The matrix stick-breaking process prior employed in the algorithms provides a flexible sharing structure, allowing two learning tasks to share only a subset of states with associated state transition probabilities and observation probabilities, instead of sharing the entire POMDP model. The simulation results for some simulated environments and for a real environment show the effectiveness of the framework and the algorithms.
机译:部分可观察的马尔可夫决策过程(POMDP)已被广泛用于机器人导航和决策中。学习准确的POMDP模型是基于模型的POMDP应用程序的先决条件。给定状态,动作和观察的定义,学习POMDP模型涉及推断状态转换概率和依赖状态的观察概率。本文基于MEDUSA(基于采样模型算法的马尔可夫探索决策)学习,多任务学习(MTL)和终身学习(LLL),提出了三种用于学习POMDP模型的贝叶斯方法。在两个POMDP应用程序中引入了这些学习算法:自适应地雷传感和在线目标搜索。;在介绍了POMDPs的背景材料之后,第1章中的MEDUSA,MTL和LLL进行了介绍,第2章介绍了使用两个传感器的地雷多模态传感的应用。 。我们首先假设有足够的数据可用于学习矿山和地物的潜在POMDP模型,并描述构建适当模型的方法。假设没有先验的地雷和混乱训练数据,并且基于改进的MEDUSA方法在线学习基础的POMDP模型,可以对此进行概括。自适应地采用预言机来揭示正在询问中的地下矿山/杂物的标签信息,并根据询问结果更新底层POMDP模型的后验。通过使用来自两个传感器的实测数据(用于埋藏地雷和杂波)给出了示例结果,以证明该算法的性能。第3-5章介绍了在线目标搜索在未知环境中的应用。在此应用程序中,将POMDP与同时定位和映射(SLAM)算法结合使用,以导航机器人(搜索声源)并同时构建全局地图。第3章介绍了SLAM算法,并提出了一种几何地图表示,用该几何地图表示由一组几何单元表示。第四章介绍了在线目标搜索框架,该框架基于修改后的MEDUSA和基于网格的SLAM,并假设可能遇到的所有可能的子世界都可用。在搜索之前,会为每个可能的子世界建立一个准确的POMDP模型。在搜索过程中,对每个子世界执行修改后的MEDUSA,以识别正确的基础模型。在第5章中删除了了解所有先验先验条件的假设,其中提出了两种转移学习方法,即多任务学习和终生学习,用于学习POMDP模型,而不能从单个任务获得训练数据足够。该算法中采用的矩阵先验打破过程提供了一种灵活的共享结构,允许两个学习任务仅共享具有相关状态转移概率和观察概率的状态子集,而不是共享整个POMDP模型。在某些模拟环境和实际环境中的模拟结果表明了该框架和算法的有效性。

著录项

  • 作者

    He, Lihan.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Engineering Electronics and Electrical.;Artificial Intelligence.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 156 p.
  • 总页数 156
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;人工智能理论;
  • 关键词

  • 入库时间 2022-08-17 11:38:37

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号