Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

Vamvoudakis K.G.; Lewis F.L.

首页> 外文期刊>International Journal of Robust and Nonlinear Control >Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

【24h】

Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

机译：非线性两人零和游戏的同步策略迭代在线求解

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The two-player zero-sum (ZS) game problem provides the solution to the bounded L _2-gain problem and so is important for robust control. However, its solution depends on solving a design Hamilton-Jacobi-Isaacs (HJI) equation, which is generally intractable for nonlinear systems. In this paper, we present an online adaptive learning algorithm based on policy iteration to solve the continuous-time two-player ZS game with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real time an approximate local solution to the game HJI equation. This method finds, in real time, suitable approximations of the optimal value and the saddle point feedback control policy and disturbance policy, while also guaranteeing closed-loop stability. The adaptive algorithm is implemented as an actor/critic/disturbance structure that involves simultaneous continuous-time adaptation of critic, actor, and disturbance neural networks. We call this online gaming algorithm 'synchronous' ZS game policy iteration. A persistence of excitation condition is shown to guarantee convergence of the critic to the actual optimal value function. Novel tuning algorithms are given for critic, actor, and disturbance networks. The convergence to the optimal saddle point solution is proven, and stability of the system is also guaranteed. Simulation examples show the effectiveness of the new algorithm in solving the HJI equation online for a linear system and a complex nonlinear system.

机译：两人零和（ZS）博弈问题提供了有界L _2增益问题的解决方案，因此对于鲁棒控制非常重要。但是，其解决方案取决于求解设计Hamilton-Jacobi-Isaacs（HJI）方程，这对于非线性系统通常是难处理的。在本文中，我们提出了一种基于策略迭代的在线自适应学习算法，用于解决具有已知动态的非线性系统的连续时间两人无限时延ZS游戏。即，该算法实时在线学习游戏HJI方程的近似局部解。该方法可以实时找到最佳值，鞍点反馈控制策略和扰动策略的合适近似值，同时还能确保闭环稳定性。自适应算法被实现为演员/批评者/扰动结构，其中涉及评论家，演员和干扰神经网络的同时连续时间自适应。我们将此在线游戏算法称为“同步” ZS游戏策略迭代。示出了激励条件的持久性，以确保评论者收敛到实际最优值函数。给出了针对评论家，演员和干扰网络的新型调谐算法。证明了最佳鞍点解的收敛性，并且还保证了系统的稳定性。仿真实例表明，该新算法在线求解线性系统和复杂非线性系统的HJI方程是有效的。

著录项

来源
《International Journal of Robust and Nonlinear Control》 |2012年第13期|共24页
作者
Vamvoudakis K.G.; Lewis F.L.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
approximate dynamic programming; Hamilton-Jacobi-Isaacs equation; Nash equilibrium; synchronous zero-sum game policy iteration;

机译：近似动态规划;Hamilton-Jacobi-Isaacs方程;Nash均衡;同步零和博弈策略迭代;

相似文献

外文文献
中文文献
专利

1. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration [J] . Vamvoudakis K.G., Lewis F.L. International Journal of Robust and Nonlinear Control . 2012,第13期

机译：非线性两人零和游戏的同步策略迭代在线求解
2. Stable value iteration for two-player zero-sum game of discrete-time nonlinear systems based on adaptive dynamic programming [J] . Song Ruizhuo, Zhu Liao Neurocomputing . 2019,第MAYa7期

机译：基于自适应动态规划的离散非线性系统两人零和游戏的稳定值迭代
3. Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems [J] . Yasini Sholeh, Karimpour Ali, Sistani Mohammad-Bagher Naghibi, International Journal of Adaptive Control and Signal Processing . 2015,第4期

机译：在线并发强化学习算法，用于求解部分未知的非线性连续时间系统的两人零和游戏
4. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration [C] . Vamvoudakis K.G., Lewis F.L. 49th IEEE Conference on Decision and Control . 2010

机译：非线性两人零和游戏的同步策略迭代在线求解
5. Deception in two-player zero-sum stochastic games: Theory and application to warfare games. [D] . Singh, Rajdeep. 2006

机译：两人零和随机游戏中的欺骗：理论和在战争游戏中的应用。
6. Modified Asano-Ohya-Khrennikov quantum-like model fordecision-making process in a two-player game with nonlinear self- and cross-interactionterms of brain’s amygdala and prefrontal-cortex [O] . Luluk Muthoharoh, Hendradi Hardhienata, Husin Alatas 2020

机译：改进的asano-ohya-khrennikov量子般的模型双人游戏中的决策过程具有非线性自我和交叉交互大脑杏仁杆菌和前额外-Coltex的条款
7. Online Gaming: Real Time Solution of Nonlinear Two-Player Zero-Sum Games Using Synchronous Policy Iteration [O] . Kyriakos G., Frank L. 2011

机译：在线游戏：使用同步策略迭代的非线性双人零和游戏的实时解决方案

Online solution of nonlinear two-player zero-sum games using synchronous policy iteration

摘要

著录项

相似文献

相关主题

期刊订阅