首页> 外文会议>Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09 >A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces
【24h】

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

机译:具有连续状态和动作空间的多维马尔可夫决策过程的收敛递归最小二乘近似策略迭代算法

获取原文

摘要

In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinite-horizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.
机译:在本文中,我们针对连续状态和动作空间中的无限水平多维马尔可夫决策过程,提出了一种递归最小二乘近似策略迭代(RLSAPI)算法。在关于价值函数和策略空间的某些问题结构假设下,近似策略迭代算法的均值可证明是收敛的。也就是说,随着逐次逼近的提高,近似策略值函数与最优值函数的平均绝对偏差变为零。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号