Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

机译：具有连续状态，动作和观察空间的POMDP的在线算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

机译：用于部分观察到的Markov决策过程的在线求解器已经应用于大型离散状态空间的问题，但是连续状态，动作和观察空间仍然是一个挑战。本文首先调查双重逐步扩大（DPW）作为对此挑战的解决方案。然而，我们证明，单独的这种修改是不够的，因为搜索树中的信仰表示塌陷到单个粒子，导致算法会聚到诸多的策略，无论计算时间如何。本文提出并评估了两种新算法，POMCPOW和PFT-DPW，通过使用加权粒子滤波来克服这种缺陷。仿真结果表明，这些修改允许算法成功，前面的方法失败。

著录项

来源
《International Conference on Automated Planning and Scheduling》|2018年|539p|共5页
会议地点
作者
Zachary N. Sunberg; Mykel J. Kochenderfer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP2-53;
关键词

相似文献

外文文献
中文文献
专利

1. Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces [J] . Jiang Xiaofeng, Yang Jian, Tan Xiaobin, IEEE Transactions on Automatic Control . 2019,第5期

机译：具有连续状态，观察空间和动作空间的POMDP的基于观察的优化
2. Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces [J] . Jiang Xiaofeng, Yang Jian, Tan Xiaobin, IEEE Transactions on Automatic Control . 2019,第5期

机译：具有连续状态，观察和动作空间的POMDP的基于观察优化
3. MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces [J] . Rakshita Agrawal, Matthew J. Realff, Jay H. Lee Computers & Chemical Engineering . 2013,第sepa13期

机译：在具有较大或连续动作和观察空间的部分观察到的马尔可夫决策过程（POMDP）中基于MILP的价值备份
4. Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces [C] . Zachary N. Sunberg, Mykel J. Kochenderfer International Conference on Automated Planning and Scheduling . 2018

机译：具有连续状态，动作和观察空间的POMDP的在线算法
5. Creating continuous design spaces for interactive genetic algorithms with layered, correlated, pattern functions. [D] . Lewis, Matthew Richard. 2001

机译：为具有分层，相关的模式功能的交互式遗传算法创建连续的设计空间。
6. Online Planning Algorithms for POMDPs [O] . Stéphane Ross, Joelle Pineau, Sébastien Paquet, -1

机译：POMDP的在线计划算法
7. DESPOT-Alpha: Online POMDP Planning with Large State and Observation Spaces [O] . Neha Priyadarshini Garg, David Hsu, Wee Sun Lee 2019

机译：Despot-alpha：具有大状态和观察空间的在线POMDP规划

Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces

摘要

著录项

相似文献

相关主题

期刊订阅