Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

Rizvi Syed Ali Asad; Lin Zongli

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

【24h】

Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

机译：离散线性二次调节器问题的输出反馈Q学习控制

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Approximate dynamic programming (ADP) and reinforcement learning (RL) have emerged as important tools in the design of optimal and adaptive control systems. Most of the existing RL and ADP methods make use of full-state feedback, a requirement that is often difficult to satisfy in practical applications. As a result, output feedback methods are more desirable as they relax this requirement. In this paper, we present a new output feedback-based Q-learning approach to solving the linear quadratic regulation (LQR) control problem for discrete-time systems. The proposed scheme is completely online in nature and works without requiring the system dynamics information. More specifically, a new representation of the LQR Q-function is developed in terms of the input-output data. Based on this new Q-function representation, output feedback LQR controllers are designed. We present two output feedback iterative Q-learning algorithms based on the policy iteration and the value iteration methods. This scheme has the advantage that it does not incur any excitation noise bias, and therefore, the need of using discounted cost functions is circumvented, which in turn ensures closed-loop stability. It is shown that the proposed algorithms converge to the solution of the LQR Riccati equation. A comprehensive simulation study is carried out, which illustrates the proposed scheme.

机译：近似动态规划（ADP）和强化学习（RL）已成为优化和自适应控制系统设计中的重要工具。大多数现有的RL和ADP方法都利用全状态反馈，这在实际应用中通常很难满足。结果，由于输出反馈方法放宽了这一要求，因此更加可取。在本文中，我们提出了一种新的基于输出反馈的Q学习方法，以解决离散时间系统的线性二次调节（LQR）控制问题。所提出的方案本质上是完全在线的，并且不需要系统动力学信息即可工作。更具体地说，根据输入输出数据开发了LQR Q函数的新表示形式。基于这种新的Q函数表示，设计了输出反馈LQR控制器。我们提出了两种基于策略迭代和值迭代方法的输出反馈迭代Q学习算法。该方案的优点在于它不会引起任何激励噪声偏差，因此避免了使用折扣成本函数的需求，从而又确保了闭环稳定性。结果表明，所提出的算法收敛于LQR Riccati方程的解。进行了全面的仿真研究，说明了所提出的方案。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2019年第5期|1523-1536|共14页
作者
Rizvi Syed Ali Asad; Lin Zongli;
展开▼
作者单位

Univ Virginia, Charles L Brown Dept Elect & Comp Engn, Charlottesville, VA 22904 USA;

Univ Virginia, Charles L Brown Dept Elect & Comp Engn, Charlottesville, VA 22904 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Approximate dynamic programming (ADP); linear quadratic regulation (LQR); output feedback; Q-learning; reinforcement learning (RL);

机译：近似动态编程（ADP）;线性二次调节（LQR）;输出反馈;Q-Learning;加固学习（RL）;

相似文献

外文文献
中文文献
专利

1. Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem [J] . Rizvi Syed Ali Asad, Lin Zongli Neural Networks and Learning Systems, IEEE Transactions on . 2019,第5期

机译：输出反馈Q学习控制，用于离散时间线性二次调节器问题
2. Experience replay-based output feedback Q-learning scheme for optimal output tracking control of discrete-time linear systems [J] . Rizvi Syed Ali Asad, Lin Zongli International Journal of Adaptive Control and Signal Processing . 2019,第12期

机译：基于重放的输出反馈Q学习方案，用于离散线性系统的最佳输出跟踪控制
3. Adaptive optimal output feedback tracking control for unknown discrete-time linear systems using a combined reinforcement Q-learning and internal model method [J] . Control Theory & Applications, IET . 2019,第18期

机译：结合强化Q学习和内部模型方法的未知离散时间线性系统的自适应最优输出反馈跟踪控制
4. Output feedback reinforcement Q-learning control for the discrete-time linear quadratic regulator problem [C] . Syed Ali Asad Rizvi, Zongli Lin IEEE Annual Conference on Decision and Control . 2017

机译：离散线性二次调节器问题的输出反馈增强Q学习控制
5. Linear and Nonlinear Adaptive Attitude Control of Asteroid-orbiting Spacecraft Using State Feedback and Output Feedback [D] . Moya, Nicholas 2018

机译：使用状态反馈和输出反馈的小行星轨道航天器的线性和非线性自适应姿态控制
6. Fuzzy ... formula ... output-feedback control for the discrete-time system with channel fadings sector nonlinearities and randomly occurring interval delays and nonlinearities [O] . Xiaozheng Fan, Yan Wang, Manfeng Hu -1

机译：具有信道衰落扇区非线性以及随机出现的间隔延迟和非线性的离散时间系统的模糊...公式输出反馈控制
7. Output Feedback H∞ Control for Linear Discrete-Time Multi-Player Systems With Multi-Source Disturbances Using Off-Policy Q-Learning [O] . Zhenfei Xiao, Jinna Li, Ping Li 2020

机译：输出反馈H∞控制线性离散时间多人多人系统，使用脱离策略Q-Learning具有多源干扰

Output Feedback Q-Learning Control for the Discrete-Time Linear Quadratic Regulator Problem

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅