首页> 外文会议> >Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes
【24h】

Policy gradient stochastic approximation algorithms for adaptive control of constrained time varying Markov decision processes

机译:受限时变马尔可夫决策过程自适应控制的策略梯度随机逼近算法

获取原文

摘要

We present constrained stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. The algorithms proposed are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. We present three classes of algorithms based on primal dual methods, augmented Lagrangian (multiplier) methods and gradient projection primal methods. Unlike neuro-dynamic programming methods such as Q-Learning, the algorithms proposed here can handle constraints and time varying parameters.
机译:我们提出了一种约束随机逼近算法,用于计算约束平均成本有限状态马尔可夫决策过程的局部最优策略。随机近似算法需要相对于表征随机策略的参数计算成本函数的梯度。这是通过涉及弱导数的基于新颖仿真的梯度估计方案计算的。提出的算法是基于仿真的,不需要显式的基础参数(例如过渡概率)的知识。我们介绍了基于原始对偶方法,增强型Lagrangian(乘数)方法和梯度投影原始方法的三类算法。与诸如Q-Learning之类的神经动力学编程方法不同,此处提出的算法可以处理约束和时变参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号