首页> 外文OA文献 >Kernel-based approximate dynamic programming using Bellman residual elimination
【2h】

Kernel-based approximate dynamic programming using Bellman residual elimination

机译:基于核的近似动态规划使用Bellman残差消除

摘要

Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable.
机译:与多主体机器人系统相关的许多顺序决策问题可以自然地提出为马尔可夫决策过程(MDP)。 MDP框架的一个重要优点是能够利用随机系统模型,从而即使系统随着时间的推移存在随机性,也可以使系统做出合理的决策。不幸的是,维数的诅咒阻碍了大多数实际尺寸的MDP的精确求解。论文的主要重点是开发用于计算大型MDP近似解决方案的新算法系列。我们的算法在本质上与Bellman残差法相似,后者试图最小化在一组样本状态下求解Bellman方程所引起的误差。但是,通过利用基于核的回归技术(例如支持向量回归和高斯过程回归),并将未退化的核函数作为潜在的成本函数功能近似架构,我们的算法能够构建成本成本的解决方案, Bellman残差在样本状态下被明确强制为零。因此,我们将我们的方法命名为Bellman残差消除(BRE)。除了开发BRE背后的基本思想外,我们还对该方法进行了多阶段且无模型的扩展。多级扩展允许为当前的MDP自动选择适当的内核,而无模型扩展可以在系统模型不可用时使用模拟或真实状态轨迹数据来学习近似策略。

著录项

  • 作者

    Bethke Brett (Brett M.);

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号