首页> 外文期刊>Neurocomputing >Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming
【24h】

Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming

机译:基于神经网络的基于网络的学习算法,用于通过自适应动态编程的控制约束的离散时间多人多人系统的合作游戏

获取原文
获取原文并翻译 | 示例

摘要

Adaptive dynamic programming (ADP), an important branch of reinforcement learning, is a powerful tool in solving various optimal control problems. However, the cooperative game issues of discrete-time multiplayer systems with control constraints have rarely been investigated in this field. In order to address this issue, a novel policy iteration (PI) algorithm is proposed based on ADP technique, and its associated convergence analysis is also studied in this brief paper. For the proposed PI algorithm, an online neural network (NN) implementation scheme with multiple-network structure is presented. In the online NN-based learning algorithm, critic network, constrained actor networks and unconstrained actor networks are employed to approximate the value function, constrained and unconstrained control policies, respectively, and the NN weight updating laws are designed based on the gradient descent method. Finally, a numerical simulation example is illustrated to show the effectiveness. (C) 2019 Elsevier B.V. All rights reserved.
机译:自适应动态编程(ADP)是强化学习的重要分支,是解决各种最优控制问题的强大工具。然而,在该领域中已经研究了具有控制约束的离散时间多人体系的协同游戏问题。为了解决这个问题,基于ADP技术提出了一种新的政策迭代(PI)算法,并在本简要纸上研究了其相关的收敛分析。对于所提出的PI算法,呈现了具有多网络结构的在线神经网络(NN)实现方案。在基于在线的基于NN的学习算法中,采用批评网络,约束的actor网络和无约束的演员网络来近似值分别函数,受约束和无约束的控制策略,并且基于梯度下降方法设计了NN权重更新规律。最后,说明了数值模拟示例以显示有效性。 (c)2019 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号