首页> 外文期刊>International Journal of Adaptive Control and Signal Processing >Output-feedback H_∞ quadratic tracking control of linear systems using reinforcement learning
【24h】

Output-feedback H_∞ quadratic tracking control of linear systems using reinforcement learning

机译:基于强化学习的线性系统输出反馈H_∞二次跟踪控制

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents an online learning algorithm based on integral reinforcement learning (IRL) to design an output-feedback (OPFB) H-infinity tracking controller for partially unknown linear continuous-time systems. Although reinforcement learning techniques have been successfully applied to find optimal state-feedback controllers, in most control applications, it is not practical to measure the full system states. Therefore, it is desired to design OPFB controllers. To this end, a general bounded L-2-gain tracking problem with a discounted performance function is used for the OPFB H-infinity tracking. A tracking game algebraic Riccati equation is then developed that gives a Nash equilibrium solution to the associated min-max optimization problem. An IRL algorithm is then developed to solve the game algebraic Riccati equation online without requiring complete knowledge of the system dynamics. The proposed IRL-based algorithm solves an IRL Bellman equation in each iteration online in real time to evaluate an OPFB policy and updates the OPFB gain using the information given by the evaluated policy. An adaptive observer is used to provide the knowledge of the full states for the IRL Bellman equation during learning. However, the observer is not needed after the learning process is finished. A simulation example is provided to verify the convergence of the proposed algorithm to a suboptimal OPFB solution and the performance of the proposed method.
机译:本文提出了一种基于积分增强学习(IRL)的在线学习算法,以设计用于部分未知线性连续时间系统的输出反馈(OPFB)H无穷大跟踪控制器。尽管强化学习技术已成功应用于发现最佳状态反馈控制器,但在大多数控制应用中,测量整个系统状态并不切实际。因此,期望设计OPFB控制器。为此,具有折扣性能函数的一般有界L-2-增益跟踪问题被用于OPFB H无限跟踪。然后,开发了一个跟踪博弈的代数Riccati方程,该方程给出了相关联的最小-最大优化问题的纳什均衡解。然后,开发了一个IRL算法来在线求解游戏代数Riccati方程,而无需完全了解系统动力学。所提出的基于IRL的算法实时在线求解每次迭代中的IRL Bellman方程,以评估OPFB策略,并使用评估后的策略提供的信息更新OPFB增益。自适应观测器用于在学习过程中为IRL Bellman方程提供完整状态的知识。但是,学习过程完成后不需要观察者。提供了一个仿真示例,以验证所提出算法对次优OPFB解决方案的收敛性以及所提出方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号