Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

Modares H.; Lewis F.L.; Naghibi-Sistani M.-B.

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

【24h】

Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

机译：基于策略迭代和神经网络的未知约束输入系统的自适应最优控制

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents an online policy iteration (PI) algorithm to learn the continuous-time optimal control solution for unknown constrained-input systems. The proposed PI algorithm is implemented on an actor–critic structure where two neural networks (NNs) are tuned online and simultaneously to generate the optimal bounded control policy. The requirement of complete knowledge of the system dynamics is obviated by employing a novel NN identifier in conjunction with the actor and critic NNs. It is shown how the identifier weights estimation error affects the convergence of the critic NN. A novel learning rule is developed to guarantee that the identifier weights converge to small neighborhoods of their ideal values exponentially fast. To provide an easy-to-check persistence of excitation condition, the experience replay technique is used. That is, recorded past experiences are used simultaneously with current data for the adaptation of the identifier weights. Stability of the whole system consisting of the actor, critic, system state, and system identifier is guaranteed while all three networks undergo adaptation. Convergence to a near-optimal control law is also shown. The effectiveness of the proposed method is illustrated with a simulation example.

机译：本文提出了一种在线策略迭代（PI）算法，用于学习未知约束输入系统的连续时间最优控制解决方案。提出的PI算法是在行为者-批评者结构上实现的，其中两个神经网络（NNs）在线同时进行调谐，以生成最佳的有界控制策略。通过将新的NN标识符与参与者和评论者NN结合使用，可以消除对系统动力学的全面了解的要求。示出了标识符权重估计误差如何影响评论者NN的收敛。开发了一种新颖的学习规则，以确保标识符权重快速收敛到理想值的小邻域。为了提供易于检查的激励条件持久性，使用了经验重播技术。即，记录的过去经验与当前数据同时用于标识符权重的适配。当三个网络都进行自适应时，可以确保由参与者，评论者，系统状态和系统标识符组成的整个系统的稳定性。还显示了收敛到最佳控制律。仿真实例说明了该方法的有效性。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2013年第10期|1513-1525|共13页
作者
Modares H.; Lewis F.L.; Naghibi-Sistani M.-B.;
展开▼
作者单位

Department of Electrical Engineering, Ferdowsi University of Mashhad, Mashhad, Iran|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Input constraints; neural networks; optimal control; reinforcement learning; unknown dynamics;

机译：输入约束;神经网络;最优控制;强化学习;未知动力学;

相似文献

外文文献
中文文献
专利

1. Event-Triggered Optimal Control for Partially Unknown Constrained-Input Systems via Adaptive Dynamic Programming [J] . Yuanheng Zhu, Dongbin Zhao, Haibo He, Industrial Electronics, IEEE Transactions on . 2017,第5期

机译：通过自适应动态规划对部分未知约束输入系统进行事件触发的最优控制
2. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems [J] . Hamidreza Modares, Frank L. Lewis, Mohammad-Bagher Naghibi-Sistani Automatica . 2014,第1期

机译：整体强化学习和经验重播，用于部分未知约束输入连续时间系统的自适应最优控制
3. A policy iteration approach to online optimal control of continuous-time constrained-input systems [J] . Hamidreza Modares, Mohammad-Bagher Naghibi Sistani, Frank L. Lewis ISA Transactions . 2013,第5期

机译：连续时间约束输入系统在线最优控制的策略迭代方法
4. Adaptive Optimal Control of Partially-unknown Constrained-input Systems using Policy Iteration with Experience Replay [C] . Hamidreza Modares, Frank L. Lewis, Mohammad-Bagher Naghibi-Sistani, AIAA guidance, navigation, and control conference . 2013

机译：具有经验重放的策略迭代对部分未知约束输入系统的自适应最优控制
5. On-line modeling and inverse optimal control of a class of unknown nonlinear systems using dynamic neural networks. [D] . Farid, Farshad. 2006

机译：使用动态神经网络对一类未知非线性系统进行在线建模和逆最优控制。
6. Neural NetworkL1 Adaptive Control of MIMO Systems with Nonlinear Uncertainty [O] . Hong-tao Zhen, Xiao-hui Qi, Jie Li, -1

机译：神经网络大号1具有非线性不确定性的MIMO系统的自适应控制
7. Neural-network based online policy iteration for continuous-time infinite-horizon optimal control of nonlinear systems [O] . Tang D., Chen L., Tian Z.F. 2015

机译：基于神经网络的非线性系统连续时间无限时域最优控制在线策略迭代

Adaptive Optimal Control of Unknown Constrained-Input Systems Using Policy Iteration and Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅