Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

Lee Jae Young; Park Jin Bae; Choi Yoon Ho

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

【24h】

Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

机译：具有不变不变探索的连续时间仿射非线性系统的整体强化学习

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.

机译：本文关注于一类称为积分RL（I-RL）的强化学习（RL）算法，该算法可解决具有仿射系统动力学的连续时间（CT）非线性最优控制问题。首先，我们将探索，积分时差和不变可容许性的概念扩展到目标CT非线性系统，该系统由控制策略以及称为探索的探测信号控制。然后，我们展示了输入-状态稳定性（ISS）和闭环系统的不变容许性，其中该回路具有通过积分策略迭代（I-PI）或不变容许PI（IA-PI）方法生成的策略。在此基础上，提出了三种在线I-RL算法，分别为探索性I-PI和积分学习I，II，在探索所需的激发条件下，它们均生成与I-PI和IA-PI相同的收敛序列。所有提出的方法都是部分或完全免费的，并且可以在在线学习过程中以稳定的方式同时探索状态空间。还研究了ISS，不变接纳性和拟议方法的收敛性，并与之相关，我们展示了安全学习探索的设计原理。本文还提出了基于神经网络的方案。最后，进行了几个数值模拟，以验证所提出方法的有效性。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on 》 |2015年第5期| 916-932| 共17页
作者
Lee Jae Young; Park Jin Bae; Choi Yoon Ho;
展开▼
作者单位

Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Convergence; Equations; Heuristic algorithms; Nonlinear systems; Optimal control; Stability analysis; Adaptive optimal control; Q-learning; continuous-time (CT); exploration; policy iteration (PI); reinforcement learning (RL);

机译：收敛性方程启发式算法非线性系统最优控制稳定性分析自适应最优控制Q学习连续时间（CT）探索策略迭代（PI）强化学习（RL）;

相似文献

外文文献
中文文献
专利

1. Integral Reinforcement Learning-Based Adaptive NN Control for Continuous-Time Nonlinear MIMO Systems With Unknown Control Directions [J] . Xinxin Guo, Weisheng Yan, Rongxin Cui IEEE Transactions on Systems, Man, and Cybernetics . 2020 ,第11期

机译：基于整体加固学习的自适应NN控制，用于具有未知控制方向的连续时间非线性MIMO系统
2. Value iteration based integral reinforcement learning approach for H∞ controller design of continuous-time nonlinear systems [J] . Xiao Geyang, Zhang Huaguang, Zhang Kun, Neurocomputing . 2018 ,第APRa12期

机译：连续非线性系统H∞控制器设计的基于值迭代的积分强化学习方法
3. Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games [J] . Ruizhuo Song, Frank L. Lewis, Qinglai Wei Neural Networks and Learning Systems, IEEE Transactions on . 2017 ,第3期

机译：解决非线性连续时间多人非零和游戏的非策略整体强化学习方法
4. Integral reinforcement learning with explorations for continuous-time nonlinear systems [C] . Lee Jae Young, Park Jin Bae, Choi Yoon Ho Neural Networks (IJCNN), The 2012 International Joint Conference on . 2012

机译：连续时间非线性系统探索中的整体强化学习
5. Data-Based Reinforcement Learning: Approximate Optimal Control for Uncertain Nonlinear Systems [D] . ?Deptu?a, Patryk 2019

机译：基于数据的强化学习：不确定非线性系统的近似最优控制
6. Koopman Invariant Subspaces and Finite Linear Representations of Nonlinear Dynamical Systems for Control [O] . Steven L. Brunton, Bingni W. Brunton, Joshua L. Proctor, -1

机译：控制非线性动力学系统的Koopman不变子空间和有限线性表示
7. Stability analysis of some classes of input-affine nonlinear systems with aperiodic sampled-data control [O] . Omran Hassan, Hetel Laurentiu, Petreczky Mihaly, 2016

机译：具有非周期采样数据控制的某类输入仿射非线性系统的稳定性分析

Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

摘要

著录项

相似文献

相关主题

期刊订阅