首页> 外文期刊>International Journal of Robust and Nonlinear Control >Online solution of nonlinear two-player zero-sum games using synchronous policy iteration
【24h】

Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

机译:非线性两人零和游戏的同步策略迭代在线求解

获取原文
获取原文并翻译 | 示例
           

摘要

The two-player zero-sum (ZS) game problem provides the solution to the bounded L _2-gain problem and so is important for robust control. However, its solution depends on solving a design Hamilton-Jacobi-Isaacs (HJI) equation, which is generally intractable for nonlinear systems. In this paper, we present an online adaptive learning algorithm based on policy iteration to solve the continuous-time two-player ZS game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real time an approximate local solution to the game HJI equation. This method finds, in real time, suitable approximations of the optimal value and the saddle point feedback control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic/disturbance structure that involves simultaneous continuous-time adaptation of critic, actor, and disturbance neural networks. We call this online gaming algorithm 'synchronous' ZS game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor, and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm in solving the HJI equation online for a linear system and a complex nonlinear system.
机译:两人零和(ZS)博弈问题提供了有界L _2增益问题的解决方案,因此对于鲁棒控制非常重要。但是,其解决方案取决于求解设计Hamilton-Jacobi-Isaacs(HJI)方程,这对于非线性系统通常是难处理的。在本文中,我们提出了一种基于策略迭代的在线自适应学习算法,用于解决具有已知动态的非线性系统的连续时间两人无限时延ZS游戏。即,该算法实时在线学习游戏HJI方程的近似局部解。该方法可以实时找到最佳值,鞍点反馈控制策略和扰动策略的合适近似值,同时还能确保闭环稳定性。自适应算法被实现为演员/批评者/扰动结构,其中涉及评论家,演员和干扰神经网络的同时连续时间自适应。我们将此在线游戏算法称为“同步” ZS游戏策略迭代。示出了激励条件的持久性,以确保评论者收敛到实际最优值函数。给出了针对评论家,演员和干扰网络的新型调谐算法。证明了最佳鞍点解的收敛性,并且还保证了系统的稳定性。仿真实例表明,该新算法在线求解线性系统和复杂非线性系统的HJI方程是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号