首页> 外文期刊>International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems >BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM
【24h】

BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM

机译:有界参数部分可观察的马尔可夫决策过程:框架和算法

获取原文
获取原文并翻译 | 示例
           

摘要

Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.
机译:部分可观察的马尔可夫决策过程(POMDP)对于不确定性下的计划非常有力。但是,由于各种原因(例如,学习模型的数据有限,无法精确地对动态情况进行建模等),采用带有精确参数的POMDP来精确地模拟现实情况通常是不切实际的。假设POMDP的参数不精确但有界,我们制定了有界参数部分可观察的马尔可夫决策过程(BPOMDP)的框架。提出了修改后的值迭代作为解决BPOMDP中参数不精确性的基本策略。此外,我们设计了基于UL的值迭代算法,其中每个值备份都基于两组向量(分别称为U集和L集)。我们提出了四种计算U集和L集的策略。我们从理论上分析了算法的计算复杂度和报酬损失。实验证明了该算法的有效性和鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号