首页> 外文会议>IEEE Conference on Decision and Control >Policy Gradient Stochastic Approximation Algorithms for Adaptive Control of Constrained Time Varying Markov Decision Processes

【24h】

Policy Gradient Stochastic Approximation Algorithms for Adaptive Control of Constrained Time Varying Markov Decision Processes

机译：基于Markov决策过程的受限时间自适应控制的政策梯度随机近似算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present constrained stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov Decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. The algorithms proposed are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. We present three classes of algorithms based on primal dual methods, augmented Lagrangian (multiplier) methods and gradient projection primal methods. Unlike neuro-dynamic programming methods such as Q-Learning, the algorithms proposed here can handle constraints and time varying parameters.

机译：我们提出了约束的随机近似算法来计算约束平均成本有限状态马尔可夫决策过程的局部最佳策略。随机近似算法需要计算成本函数的梯度相对于表征随机化策略的参数。这是由涉及弱衍生物的新型仿真梯度估计方案来计算。所提出的算法是基于仿真，不需要明确了解潜在参数，例如转换概率。我们介绍了基于原始双方法的三类算法，增强拉格朗日（乘法器）方法和梯度投影原始方法。与Q-Learning等神经动态规划方法不同，这里提出的算法可以处理约束和时间变化参数。

著录项

来源
《IEEE Conference on Decision and Control》|2003年||共6页
会议地点
作者
Felisa J. Vazquez Abad; Vikram Krishnamurthy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP273-53;
关键词

相似文献

外文文献
中文文献
专利

1. ${Q}$-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control [J] . Dejan V. Djonin, Vikram Krishnamurthy IEEE Transactions on Signal Processing . 2007,第期

机译：带有随机单调策略的约束Markov决策过程的$ {Q} $-学习算法：在MIMO传输控制中的应用
2. Stochastic approximations of constrained discounted Markov decision processes [J] . Fran?ois Dufour, Tomás Prieto-Rumeau Journal of Mathematical Analysis and Applications . 2014,第2期

机译：约束折扣马尔可夫决策过程的随机逼近
3. An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes [J] . Bhatnagar S., Lakshmanan K. Journal of Optimization Theory and Applications . 2012,第3期

机译：约束Markov决策过程的带函数逼近的在线Actor-Critic算法
4. Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes [C] . Abad, F.J.V., Krishnamurthy, . 2003

机译：受限时变马尔可夫决策过程自适应控制的策略梯度随机逼近算法
5. Discrete-time partially observed Markov decision processes: Ergodic, adaptive, and safety control. [D] . Hsu, Shun-Pin. 2002

机译：离散时间部分观察到的马尔可夫决策过程：遍历，自适应和安全控制。
6. Stochastic Learning for Sparse Discrete Markov Random Fields with Controlled Gradient Approximation Error [O] . Sinong Geng, Zhaobin Kuang, Jie Liu, -1

机译：具有控制梯度近似误差的稀疏离散Markov随机场的随机学习
7. Policy learning for time-bounded reachability in continuous-time Markov decision processes via doubly-stochastic gradient ascent [O] . Bartocci Ezio, Bortolussi Luca, Brázdil Tomǎš, 2016

机译：通过双随机梯度上升进行连续时间马尔可夫决策过程中时间可及性的策略学习
8. Randomized Difference Two-Timescale Simultaneous Perturbation Stochastic Approximation Algorithms for Simulation Optimization of Hidden Markov Models. [R] . Bhatnagar, S., Fu, M. C., Marcus, S. I., 2000

机译：随机差分双时间尺度同时扰动随机逼近算法在隐马尔可夫模型仿真优化中的应用。

Policy Gradient Stochastic Approximation Algorithms for Adaptive Control of Constrained Time Varying Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅