A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

机译：具有连续状态和动作空间的多维马尔可夫决策过程的收敛递归最小二乘近似策略迭代算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we present a recursive least squares approximate policy iteration (RLSAPI) algorithm for infinite-horizon multi-dimensional Markov decision process in continuous state and action spaces. Under certain problem structure assumptions on value functions and policy spaces, the approximate policy iteration algorithm is provably convergent in the mean. That is to say the mean absolute deviation of the approximate policy value function from the optimal value function goes to zero as successive approximation improves.

机译：在本文中，我们针对连续状态和动作空间中的无限水平多维马尔可夫决策过程，提出了一种递归最小二乘近似策略迭代（RLSAPI）算法。在关于价值函数和策略空间的某些问题结构假设下，近似策略迭代算法的均值可证明是收敛的。也就是说，随着逐次逼近的提高，近似策略值函数与最优值函数的平均绝对偏差变为零。

著录项

来源
《Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09》|2009年|66-73|共8页
会议地点 Nashville TN(US);Nashville TN(US)
作者
Jun Ma; Powell W.B.;
展开▼
作者单位

Dept. of Oper. Res. Financial Eng., Princeton Univ., Princeton, NJ;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Markov processes; approximation theory; convergence of numerical methods; decision theory; iterative methods; least squares approximations; action space; approximate policy value function; continuous state space; convergent recursive least square approximate policy iteration algorithm; mean absolute deviation; multidimensional Markov decision process; optimal value function;

机译：马尔可夫过程;逼近理论;数值方法的收敛性;决策理论;迭代方法;最小二乘近似;作用空间;近似策略值函数;连续状态空间;收敛递归最小二乘近似策略迭代算法;平均绝对偏差;多维马尔可夫决策过程;最优值函数;

相似文献

外文文献
中文文献
专利

1. Estimate and approximate policy iteration algorithm for discounted Markov decision models with bounded costs and Borel spaces [J] . M. Teresa Robles-Alcaraz, Oscar Vega-Amaya, J. Adolfo Minjarez-Sosa Risk and decision analysis . 2017,第2期

机译：具有受限成本和Borel空间的折扣马尔可夫决策模型的估计与近似策略迭代算法。
2. Policy Iteration for Continuous-Time Average Reward Markov Decision Processes in Polish Spaces [J] . QuanxinZhu, XinsongYang, ChuangxiaHuang Abstract and applied analysis . 2009,第6期

机译：波兰空间中连续时间平均奖励马尔可夫决策过程的策略迭代
3. A note on the convergence of policy iteration in Markov decision processes with compact action spaces [J] . Golubin AY. Mathematics of operations research . 2003,第1期

机译：关于具有紧凑动作空间的Markov决策过程中策略迭代收敛性的注记
4. A Convergent Recursive Least Squares Approximate Policy Iteration Algorithm for Multi-Dimensional Markov Decision Process with Continuous State and Action Spaces [C] . Jun Ma, Warren B. Powell IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning . 2009

机译：具有连续状态和动作空间的多维马尔可夫决策过程的收敛递归最小二乘性近似策略迭代算法
5. Approximate Policy Iteration Algorithms for Continuous, Multidimensional Applications and Convergence Analysis. [D] . Ma, Jun. 2011

机译：连续，多维应用程序和收敛性分析的近似策略迭代算法。
6. Using model-based proposals for fast parameter inference on discrete state space continuous-time Markov processes [O] . C. M. Pooley, S. C. Bishop, G. Marion 2015

机译：使用基于模型的建议对离散状态空间连续时间马尔可夫过程进行快速参数推断
7. A Convergent Recursive Least Squares Approximate Policy Iteration Algorithm for Multi-Dimensional Markov Decision Process with Continuous State and Action Spaces [O] . Jun Ma, Warren B. Powell 2009

机译：具有连续状态和动作空间的多维马尔可夫决策过程的收敛递归最小二乘近似策略迭代算法

A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅