Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces

Jiang Xiaofeng; Yang Jian; Tan Xiaobin; Xi Hongsheng

首页> 外文期刊>IEEE Transactions on Automatic Control >Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces

【24h】

Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces

机译：具有连续状态，观察空间和动作空间的POMDP的基于观察的优化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper considers the optimization problem for partially observable Markov decision processes (POMDPs) with the continuous state, observation, and action spaces. POMDPs with the discrete spaces have emerged as a promising approach to the decision systems with imperfect state information. However, in recent applications of POMDPs, there are many problems that have continuous states, observations, and actions. For such problems, due to the infinite dimensionality of the belief space, the existing studies usually discretize the continuous spaces with the sufficient or nonsufficient statistics, which may cause the curse of dimensionality and performance degradation. In this paper. based on the sensitivity analysis of the performance criteria, we have developed a simulation-based policy iteration algorithm to find the local optimal observation-based policy for POMDPs with the continuous spaces. The proposed algorithm needs none of the specific assumptions and prior information, and has a low computational complexity. One numerical example of the complicated multiple-input multiple-output beamforming problem shows that the algorithm has a significant performance improvement.

机译：本文考虑具有连续状态，观察空间和动作空间的部分可观察的马尔可夫决策过程（POMDP）的优化问题。具有离散空间的POMDP已经成为一种具有不完善状态信息的决策系统的有前途的方法。但是，在POMDP的最新应用中，存在许多问题，这些问题具有连续的状态，观察结果和操作。对于这样的问题，由于置信空间的无限维，现有的研究通常以足够或不足的统计量离散连续的空间，这可能会导致维数的诅咒和性能下降。在本文中。基于性能标准的敏感性分析，我们开发了一种基于仿真的策略迭代算法，以找到具有连续空间的POMDP的基于局部最优观测的策略。所提出的算法不需要特定的假设和先验信息，并且具有较低的计算复杂度。复杂的多输入多输出波束成形问题的一个数值示例表明，该算法具有显着的性能改进。

著录项

来源
《IEEE Transactions on Automatic Control》 |2019年第5期|2045-2052|共8页
作者
Jiang Xiaofeng; Yang Jian; Tan Xiaobin; Xi Hongsheng;
展开▼
作者单位

Univ Sci & Technol China, Dept Automat, Hefei 230000, Anhui, Peoples R China;

Univ Sci & Technol China, Dept Automat, Hefei 230000, Anhui, Peoples R China;

Univ Sci & Technol China, Dept Automat, Hefei 230000, Anhui, Peoples R China;

Univ Sci & Technol China, Dept Automat, Hefei 230000, Anhui, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Continuous spaces; none of the prior information; partially observable Markov decision process (POMDP); sensitivity analysis; simulation-based optimization;

机译：连续空间;没有先前的信息;部分观察到的马尔可夫决策过程（POMDP）;敏感性分析;基于模拟的优化;

相似文献

外文文献
中文文献
专利

1. Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces [J] . Jiang Xiaofeng, Yang Jian, Tan Xiaobin, IEEE Transactions on Automatic Control . 2019,第5期

机译：具有连续状态，观察和动作空间的POMDP的基于观察优化
2. MILP based value backups in partially observed Markov decision processes (POMDPs) with very large or continuous action and observation spaces [J] . Rakshita Agrawal, Matthew J. Realff, Jay H. Lee Computers & Chemical Engineering . 2013,第sepa13期

机译：在具有较大或连续动作和观察空间的部分观察到的马尔可夫决策过程（POMDP）中基于MILP的价值备份
3. Finding Optimal Observation-Based Policies for Constrained POMDPs Under the Expected Average Reward Criterion [J] . Xiaofeng Jiang, Hongsheng Xi, Xiaodong Wang, IEEE Transactions on Automatic Control . 2016,第10期

机译：在预期平均奖励标准下寻找约束POMDP的最佳基于观察的策略
4. Online Algorithms for POMDPs with Continuous State, Action, and Observation Spaces [C] . Zachary N. Sunberg, Mykel J. Kochenderfer International Conference on Automated Planning and Scheduling . 2018

机译：具有连续状态，动作和观察空间的POMDP的在线算法
5. Secondary Inorganic Soluble Aerosol in Hong Kong: Continuous Measurements, Formation Mechanism Discussion and Improvement of an Observation-Based Model to Study Control Strategies [D] . Xue, Jian 2012

机译：香港二次无机可溶性气溶胶：连续测量，形成机理的讨论和基于观测的模型研究控制策略的改进
6. Active Sensing for Continuous State and Action Spaces via Task-Action Entropy Minimization [O] . Tipakorn Greigarn, M. Cenk Çavuşoğlu -1

机译：通过任务-动作熵最小化对连续状态和动作空间进行主动感知
7. Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces [O] . Michael H. Lim, Claire Tomlin, Zachary N. Sunberg 2020

机译：稀疏的树搜索最佳保证POMDPS，具有连续观察空间

Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅