【24h】

Bounded-Parameter Partially Observable Markov Decision Processes

机译:有界参数部分可观察的马尔可夫决策过程

获取原文
获取原文并翻译 | 示例

摘要

The POMDP is considered as a powerful model for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model precisely the real-life situations, due to various reasons such as limited data for learning the model, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four typical strategies for setting U-set and L-set, and some of them guarantee that the modified value iteration is implemented through the algorithm. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are revealed by empirical studies.
机译:POMDP被认为是在不确定情况下进行规划的强大模型。但是,由于各种原因(例如,学习模型的数据有限等),使用带有精确参数的POMDP来精确地模拟现实情况通常是不切实际的。在本文中,假设POMDP的参数不精确,但是在有界的情况下,我们制定了有界参数部分可观察的马尔可夫决策过程(BPOMDP)的框架。提出了修改后的值迭代作为解决BPOMDP中参数不精确性的基本策略。此外,我们设计了基于UL的值迭代算法,其中每个值备份都基于两组向量(分别称为U集和L集)。我们提出了四种设置U集和L集的典型策略,其中一些策略可以确保通过算法实现修改后的值迭代。我们从理论上分析了算法的计算复杂度和报酬损失。实证研究表明了该算法的有效性和鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号