$Q$ Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems
首页> 外文期刊>IEEE Transactions on Systems, Man, and Cybernetics >Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems
【24h】

Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems

机译:加固型Q学习算法H∞路径线性系统的追踪控制

获取原文
获取原文并翻译 | 示例
           

摘要

This article addresses the online reinforcement $Q$ -learning algorithms to design $H_{infty }$ tracking controller for unknown discrete-time linear systems. An augmented system composed of the original system and the command generator is constructed, and a discounted performance function is introduced to establish a discounted game algebraic Riccati equation (GARE). The existence conditions of a solution to the GARE are proposed and a lower bound is found for the discount factor to assure the stability of the $H_{infty }$ tracking control solution. The $Q$ -function Bellman equation is then derived, based on which the reinforcement $Q$ -learning algorithm is developed to learn the solution to $H_{infty }$ tracking control problem without knowing the system dynamics. Both state-data-driven and output-data-driven reinforcement $Q$ -learning algorithms toward finding the control policies are proposed. Unlike the value function approximation (VFA)-based approach, it is proved that the $Q$ -learning scheme brings out no bias of solution to the $Q$ -function Bellman equation under the probing noise satisfying the persistent excitation (PE) condition, and therefore, converges to the nominal discounted GARE solution. Moreover, the proposed output-data-driven method is more powerful than the state-data-driven method as it may not be available to completely measure the full system states in practical applications. A simulation example with a single-phase voltage-source UPS inverter is used to verify the effectiveness of the proposed $Q$ -learning algorithms.
机译:本文涉及在线加强<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ Q $ - 用于设计的学习算法<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ h _ { idty} $ 用于未知离散时间线性系统的跟踪控制器。构建了由原始系统和命令发生器组成的增强系统,并引入了折扣性能功能以建立折扣游戏代数Riccati方程(GARE)。提出了对GARE解决方案的存在条件,并找到了折扣因子的下限,以确保稳定性<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ h _ { idty} $ 跟踪控制解决方案。这<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ Q $ 然后基于哪种加强件来得出灯箱方程<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ Q $ - 开发了学习算法以了解解决方案<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ h _ { idty} $ 在不知道系统动态的情况下跟踪控制问题。两个状态驱动和输出数据驱动的加固<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ Q $ - 提出了用于找到控制策略的学习算法。与价值函数近似(VFA)的方法不同,证明了<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ Q $ - 学习方案带来了没有偏见的解决方案<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ Q $ - 在探测噪声下的呼吸噪声(PE)条件下的函数阵列方程,因此,收敛到标称折扣的GARE解决方案。此外,所提出的输出数据驱动方法比状态数据驱动方法更强大,因为它可能无法在实际应用中完全测量完整的系统状态。具有单相电压源UPS逆变器的模拟示例用于验证所提出的有效性<内联公式XMLNS:MML =“http://www.w3.org/1998/math/mathml”xmlns:xlink =“http://www.w3.org/1999/xlink”> $ Q $ - 学习算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号