...
首页> 外文期刊>Transactions of the Institute of Measurement and Control >Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems
【24h】

Off-policy integral reinforcement learning algorithm in dealing with nonzero sum game for nonlinear distributed parameter systems

机译:非线性分布式参数系统非零和游戏处理非零综合加固学习算法

获取原文
获取原文并翻译 | 示例
           

摘要

Benefitting from the technology of integral reinforcement learning, the nonzero sum (NZS) game for distributed parameter systems is effectively solved in this paper when the information of system dynamics are unavailable. The Karhunen-Loeve decomposition (KLD) is employed to convert the partial differential equation (PDE) systems into high-order ordinary differential equation (ODE) systems. Moreover, the off-policy IRL technology is introduced to design the optimal strategies for the NZS game. To confirm that the presented algorithm will converge to the optimal value functions, the traditional adaptive dynamic programming (ADP) method is first discussed. Then, the equivalence between the traditional ADP method and the presented off-policy method is proved. For implementing the presented off-policy IRL method, actor and critic neural networks are utilized to approach the value functions and control strategies in the iteration process, individually. Finally, a numerical simulation is shown to illustrate the effectiveness of the proposal off-policy algorithm.
机译:从整体增强学习技术中受益,当系统动态的信息不可用时,在本文中有效地解决了分布式参数系统的非零(NZS)游戏。采用Karhunen-Loeve分解(KLD)将部分微分方程(PDE)系统转换为高阶常微分方程(ODE)系统。此外,介绍了禁止的IRL技术,以设计NZS游戏的最佳策略。为了确认所呈现的算法将收敛到最佳值函数,首先讨论传统的自适应动态编程(ADP)方法。然后,证明了传统ADP方法与所呈现的禁止策略方法之间的等价。为了实现所呈现的off-policy irl方法,参与者和批评者的神经网络被利用来分别接近迭代过程中的价值函数和控制策略。最后,示出了数值模拟来说明提案脱核算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号