...
首页> 外文期刊>Automatica >Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
【24h】

Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations

机译:多人非零和游戏:汉密尔顿-雅各比方程组的在线自适应学习解决方案

获取原文
获取原文并翻译 | 示例

摘要

In this paper we present an online adaptive control algorithm based on policy iteration reinforcement learning techniques to solve the continuous-time (CT) multi player non-zero-sum (NZS) game with infinite horizon for linear and nonlinear systems. NZS games allow for players to have a cooperative team component and an individual selfish component of strategy. The adaptive algorithm learns online the solution of coupled Riccati equations and coupled Hamilton-Jacobi equations for linear and nonlinear systems respectively. This adaptive control method finds in real-time approximations of the optimal value and the NZS Nash-equilibrium, while also guaranteeing closed-loop stability. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. A persistence of excitation condition is shown to guarantee convergence of every critic to the actual optimal value function for that player. A detailed mathematical analysis is done for 2-player NZS games. Novel tuning algorithms are given for the actor/critic networks. The convergence to the Nash equilibrium is proven and stability of the system is also guaranteed. This provides optimal adaptive control solutions for both nonzero-sum games and their special case, the zero-sum games. Simulation examples show the effectiveness of the new algorithm.
机译:在本文中,我们提出了一种基于策略迭代强化学习技术的在线自适应控制算法,用于求解线性和非线性系统的无限时间连续时间(CT)多玩家非零和(NZS)游戏。 NZS游戏允许玩家拥有合作团队成分和策略的个人自私成分。自适应算法在线学习线性和非线性系统的耦合Riccati方程和耦合Hamilton-Jacobi方程的解。这种自适应控制方法可以实时找到最佳值和NZS Nash平衡,同时还可以确保闭环稳定性。最佳自适应算法被实现为针对每个玩家的单独的参与者/批评者参数网络逼近器结构,并且涉及参与者/批评者网络的同时连续时间适配。显示了激励条件的持久性,以确保每个评论家都可以收敛到该玩家的实际最佳价值函数。对2人游戏的NZS游戏进行了详细的数学分析。针对演员/评论网络给出了新颖的调优算法。证明了纳什均衡的收敛性,并且还保证了系统的稳定性。这为非零和游戏及其特殊情况(零和游戏)提供了最佳的自适应控制解决方案。仿真实例表明了该算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号