An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

Bhatnagar S.; Lakshmanan K.

首页> 外文期刊>Journal of Optimization Theory and Applications >An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

【24h】

An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

机译：约束Markov决策过程的带函数逼近的在线Actor-Critic算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We develop an online actor-critic reinforcement learning algorithm with function approximation for a problem of control under inequality constraints. We consider the long-run average cost Markov decision process (MDP) framework in which both the objective and the constraint functions are suitable policy-dependent long-run averages of certain sample path functions. The Lagrange multiplier method is used to handle the inequality constraints. We prove the asymptotic almost sure convergence of our algorithm to a locally optimal solution. We also provide the results of numerical experiments on a problem of routing in a multi-stage queueing network with constraints on long-run average queue lengths. We observe that our algorithm exhibits good performance on this setting and converges to a feasible point.

机译：我们针对不等式约束下的控制问题，开发了一种具有函数逼近的在线演员批判强化学习算法。我们考虑长期平均成本马尔可夫决策过程（MDP）框架，其中目标函数和约束函数都是某些样本路径函数的适合策略的长期平均值。拉格朗日乘数法用于处理不等式约束。我们证明了算法的渐近几乎确定收敛到局部最优解。我们还提供了对长期平均队列长度有约束的多阶段排队网络中路由问题的数值实验结果。我们观察到我们的算法在此设置下表现出良好的性能，并收敛到一个可行的点。

著录项

来源
《Journal of Optimization Theory and Applications》 |2012年第3期|共21页
作者
Bhatnagar S.; Lakshmanan K.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类应用数学;
关键词
Actor-critic algorithm; Constrained Markov decision processes; Function approximation; Long-run average cost criterion;

机译：Actor-critic算法;约束Markov决策过程;函数逼近;长期平均成本准则;

相似文献

外文文献
中文文献
专利

1. An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes [J] . Bhatnagar S., Lakshmanan K. Journal of Optimization Theory and Applications . 2012,第3期

机译：约束Markov决策过程的带函数逼近的在线Actor-Critic算法
2. An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes [J] . Bhatnagar Shalabh Systems and Control Letters . 2010,第12期

机译：折扣成本约束马尔可夫决策过程的函数逼近的actor-critic算法
3. An Online Actor–Critic Algorithm with Function Approximation for Constrained Markov Decision Processes [J] . Shalabh Bhatnagar, K. Lakshmanan Journal of Optimization Theory and Applications . 2012,第3期

机译：约束马尔可夫决策过程的带函数逼近的在线Actor-Critic算法
4. A novel Q-learning algorithm with function approximation for constrained Markov decision processes [C] . Lakshmanan K., Bhatnagar Shalabh Annual Allerton Conference on Communication, Control, and Computing . 2012

机译：约束马尔可夫决策过程的一种具有函数逼近的新型Q学习算法
5. Linear approximations for factored Markov decision processes. [D] . Patrascu, Relu-Eugen. 2005

机译：因子马尔可夫决策过程的线性近似。
6. Data-Driven Markov Decision Process Approximations for PersonalizedHypertension Treatment Planning [O] . Greggory J. Schell, Wesley J. Marrero, Mariel S. Lavieri, 2016

机译：数据驱动的个性化马尔可夫决策过程近似高血压治疗计划
7. A Simultaneous Perturbation Stochastic Approximation-Based Actor-Critic Algorithm for Markov Decision Processes [O] . Bhatnagar Shalabh, Kumar Shishir 2004

机译：马尔可夫决策过程的基于同时摄动随机逼近的Actor-Critic算法

An Online Actor-Critic Algorithm with Function Approximation for Constrained Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅