首页> 外文会议>IEEE Conference on Decision and Control >Policy Gradient Stochastic Approximation Algorithms for Adaptive Control of Constrained Time Varying Markov Decision Processes
【24h】

Policy Gradient Stochastic Approximation Algorithms for Adaptive Control of Constrained Time Varying Markov Decision Processes

机译:基于Markov决策过程的受限时间自适应控制的政策梯度随机近似算法

获取原文

摘要

We present constrained stochastic approximation algorithms for computing the locally optimal policy of a constrained average cost finite state Markov Decision process. The stochastic approximation algorithms require computation of the gradient of the cost function with respect to the parameter that characterizes the randomized policy. This is computed by novel simulation based gradient estimation schemes involving weak derivatives. The algorithms proposed are simulation based and do not require explicit knowledge of the underlying parameters such as transition probabilities. We present three classes of algorithms based on primal dual methods, augmented Lagrangian (multiplier) methods and gradient projection primal methods. Unlike neuro-dynamic programming methods such as Q-Learning, the algorithms proposed here can handle constraints and time varying parameters.
机译:我们提出了约束的随机近似算法来计算约束平均成本有限状态马尔可夫决策过程的局部最佳策略。随机近似算法需要计算成本函数的梯度相对于表征随机化策略的参数。这是由涉及弱衍生物的新型仿真梯度估计方案来计算。所提出的算法是基于仿真,不需要明确了解潜在参数,例如转换概率。我们介绍了基于原始双方法的三类算法,增强拉格朗日(乘法器)方法和梯度投影原始方法。与Q-Learning等神经动态规划方法不同,这里提出的算法可以处理约束和时间变化参数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号