首页> 外文会议>Robotics: Science and Systems Conference; 20050608-11; Cambridge,MA(US) >Robot Planning in Partially Observable Continuous Domains
【24h】

Robot Planning in Partially Observable Continuous Domains

机译:部分可观察的连续域中的机器人规划

获取原文
获取原文并翻译 | 示例

摘要

We present a value iteration algorithm for learning to act in Partially Observable Markov Decision Processes (POMDPs) with continuous state spaces. Mainstream POMDP research focuses on the discrete case and this complicates its application to, e.g., robotic problems that are naturally modeled using continuous state spaces. The main difficulty in defining a (belief-based) POMDP in a continuous state space is that expected values over states must be defined using integrals that, in general, cannot be computed in closed from. In this paper, we first show that the optimal finite-horizon value function over the continuous infinite-dimensional POMDP belief space is piecewise linear and convex, and is defined by a finite set of supporting α-functions that are analogous to the α-vectors (hyperplanes) defining the value function of a discrete-state POMDP. Second, we show that, for a fairly general class of POMDP models in which all functions of interest are modeled by Gaussian mixtures, all belief updates and value iteration backups can be carried out analytically and exact. A crucial difference with respect to the α-vectors of the discrete case is that, in the continuous case, the α-functions will typically grow in complexity (e.g., in the number of components) in each value iteration. Finally, we demonstrate Perseus, our previously proposed randomized point-based value iteration algorithm, in a simple robot planning problem with a continuous domain, where encouraging results are observed.
机译:我们提出了一种价值迭代算法,用于学习在具有连续状态空间的部分可观察的马尔可夫决策过程(POMDP)中起作用。主流的POMDP研究专注于离散情况,这使其在例如使用连续状态空间自然建模的机器人问题等方面的应用复杂化。在连续状态空间中定义(基于信念的)POMDP的主要困难在于,必须使用积分来定义状态的期望值,这些积分通常不能从封闭的角度进行计算。在本文中,我们首先表明,连续无穷维POMDP信念空间上的最佳有限水平值函数是分段线性的和凸的,并且由类似于α向量的有限支持α函数集定义。 (超平面)定义离散状态POMDP的值函数。其次,我们表明,对于相当通用的POMDP模型类别(其中所有感兴趣的功能均由高斯混合模型建模),所有信念更新和价值迭代备份都可以进行分析且精确地执行。关于离散情况的α向量的关键区别在于,在连续情况下,α函数通常将在每次值迭代中的复杂度(例如,分量数)上增长。最后,我们在一个具有连续域的简单机器人规划问题中演示了我们先前提出的基于点的随机值迭代算法Perseus,在该问题中观察到了令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号