...
首页> 外文期刊>Journal of Optimization Theory and Applications >An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes
【24h】

An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

机译:约束Markov决策过程的带函数逼近的在线Actor-Critic算法

获取原文
获取原文并翻译 | 示例
           

摘要

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.
机译:我们针对不等式约束下的控制问题,开发了一种具有函数逼近的在线演员批判强化学习算法。我们考虑长期平均成本马尔可夫决策过程(MDP)框架,其中目标函数和约束函数都是某些样本路径函数的适合策略的长期平均值。拉格朗日乘数法用于处理不等式约束。我们证明了算法的渐近几乎确定收敛到局部最优解。我们还提供了对长期平均队列长度有约束的多阶段排队网络中路由问题的数值实验结果。我们观察到我们的算法在此设置下表现出良好的性能,并收敛到一个可行的点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号