首页> 外文期刊>IEEE Transactions on Automatic Control >Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces
【24h】

Observation-Based Optimization for POMDPs With Continuous State, Observation, and Action Spaces

机译:具有连续状态,观察空间和动作空间的POMDP的基于观察的优化

获取原文
获取原文并翻译 | 示例

摘要

This paper considers the optimization problem for partially observable Markov decision processes (POMDPs) with the continuous state, observation, and action spaces. POMDPs with the discrete spaces have emerged as a promising approach to the decision systems with imperfect state information. However, in recent applications of POMDPs, there are many problems that have continuous states, observations, and actions. For such problems, due to the infinite dimensionality of the belief space, the existing studies usually discretize the continuous spaces with the sufficient or nonsufficient statistics, which may cause the curse of dimensionality and performance degradation. In this paper. based on the sensitivity analysis of the performance criteria, we have developed a simulation-based policy iteration algorithm to find the local optimal observation-based policy for POMDPs with the continuous spaces. The proposed algorithm needs none of the specific assumptions and prior information, and has a low computational complexity. One numerical example of the complicated multiple-input multiple-output beamforming problem shows that the algorithm has a significant performance improvement.
机译:本文考虑具有连续状态,观察空间和动作空间的部分可观察的马尔可夫决策过程(POMDP)的优化问题。具有离散空间的POMDP已经成为一种具有不完善状态信息的决策系统的有前途的方法。但是,在POMDP的最新应用中,存在许多问题,这些问题具有连续的状态,观察结果和操作。对于这样的问题,由于置信空间的无限维,现有的研究通常以足够或不足的统计量离散连续的空间,这可能会导致维数的诅咒和性能下降。在本文中。基于性能标准的敏感性分析,我们开发了一种基于仿真的策略迭代算法,以找到具有连续空间的POMDP的基于局部最优观测的策略。所提出的算法不需要特定的假设和先验信息,并且具有较低的计算复杂度。复杂的多输入多输出波束成形问题的一个数值示例表明,该算法具有显着的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号