首页> 外文期刊>The Journal of Artificial Intelligence Research >Finding Approximate POMDP Solutions Through Belief Compression
【24h】

Finding Approximate POMDP Solutions Through Belief Compression

机译:通过信念压缩找到近似的POMDP解决方案

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta, & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.
机译:对于大型模型,通常认为使用标准值函数方法为部分可观察的马尔可夫决策过程(POMDP)寻找策略。这些算法的难处理性很大程度上是在整个置信空间上计算出精确,最优策略的结果。但是,在实际的POMDP问题中,即使对于复杂的策略类别的问题,对于良好的控制,通常也不需要为整个置信空间计算最佳策略。控制器所经历的信念通常位于嵌入在高维信念空间中的结构化低维子空间附近。仅针对该子空间找到最佳值函数的良好近似度比计算全值函数容易得多。我们介绍了一种通过减少置信空间的维数来解决大规模POMDP的新方法。我们使用指数族主成分分析(Collins,Dasgupta和Schapire,2002)来表示稀疏的,高维的信念空间,它使用少量的信念状态学习特征。然后,我们仅根据低维信念特征进行计划。通过在这个低维空间中进行规划,我们可以找到比传统技术可以处理的模型大几个数量级的POMDP模型的策略。我们演示了在合成问题和移动机器人导航任务上使用此算法的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号