首页> 外文会议> >Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes

【24h】

Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes

机译：受限时变马尔可夫决策过程自适应控制的策略梯度随机逼近算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present constrained stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. The algorithms proposed are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. We present three classes of algorithms based on primal dual methods, augmented Lagrangian (multiplier) methods and gradient projection primal methods. Unlike neuro-dynamic programming methods such as Q-Learning, the algorithms proposed here can handle constraints and time varying parameters.

机译：我们提出了一种约束随机逼近算法，用于计算约束平均成本有限状态马尔可夫决策过程的局部最优策略。随机近似算法需要相对于表征随机策略的参数计算成本函数的梯度。这是通过涉及弱导数的基于新颖仿真的梯度估计方案计算的。提出的算法是基于仿真的，不需要显式的基础参数（例如过渡概率）的知识。我们介绍了基于原始对偶方法，增强型Lagrangian（乘数）方法和梯度投影原始方法的三类算法。与诸如Q-Learning之类的神经动力学编程方法不同，此处提出的算法可以处理约束和时变参数。

著录项

来源
《》|2003年|p.2823-2828|共6页
会议地点
作者
Abad; F.J.V.; Krishnamurthy; V.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术 ;
关键词
gradient methods; adaptive control; Markov processes; decision theory; approximation theory; constraint handling; time-varying systems; policy gradient stochastic approximation; adaptive control; constrained time varying Markov decision processes; average cost finite state Markov decision process; gradient estimation schemes; weak derivatives; augmented Lagrangian methods; gradient projection primal methods;

机译：梯度法;自适应控制;马尔可夫过程;决策理论;逼近理论;约束处理;时变系统;策略梯度随机逼近;自适应控制;约束时变马尔可夫决策过程;平均成本有限状态马尔可夫决策过程;梯度估计方案;弱导数;增强拉格朗日方法;梯度投影原始方法;

相似文献

外文文献
中文文献
专利

1. ${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control [J] . Dejan V. Djonin, Vikram Krishnamurthy IEEE Transactions on Signal Processing . 2007 ,第期

机译：带有随机单调策略的约束Markov决策过程的$ {Q} $-学习算法：在MIMO传输控制中的应用
2. Stochastic approximations of constrained discounted Markov decision processes [J] . Fran?ois Dufour, Tomás Prieto-Rumeau Journal of Mathematical Analysis and Applications . 2014 ,第2期

机译：约束折扣马尔可夫决策过程的随机逼近
3. An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes [J] . Bhatnagar S., Lakshmanan K. Journal of Optimization Theory and Applications . 2012 ,第3期

机译：约束Markov决策过程的带函数逼近的在线Actor-Critic算法
4. Policy Gradient Stochastic Approximation Algorithms for Adaptive Control of Constrained Time Varying Markov Decision Processes [C] . Felisa J. Vazquez Abad, Vikram Krishnamurthy IEEE Conference on Decision and Control . 2003

机译：基于Markov决策过程的受限时间自适应控制的政策梯度随机近似算法
5. Discrete-time partially observed Markov decision processes: Ergodic, adaptive, and safety control. [D] . Hsu, Shun-Pin. 2002

机译：离散时间部分观察到的马尔可夫决策过程：遍历，自适应和安全控制。
6. Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error [O] . Sinong Geng, Zhaobin Kuang, Jie Liu, -1

机译：具有控制梯度近似误差的稀疏离散Markov随机场的随机学习
7. Policy learning for time-bounded reachability in continuous-time Markov decision processes via doubly-stochastic gradient ascent [O] . Bartocci Ezio, Bortolussi Luca, Brázdil Tomǎš, 2016

机译：通过双随机梯度上升进行连续时间马尔可夫决策过程中时间可及性的策略学习
8. Randomized Difference Two-Timescale Simultaneous Perturbation Stochastic Approximation Algorithms for Simulation Optimization of Hidden Markov Models. [R] . Bhatnagar, S., Fu, M. C., Marcus, S. I., 2000

机译：随机差分双时间尺度同时扰动随机逼近算法在隐马尔可夫模型仿真优化中的应用。

Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅