...
首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations
【24h】

Integral Reinforcement Learning for Continuous-Time Input-Affine Nonlinear Systems With Simultaneous Invariant Explorations

机译:具有不变不变探索的连续时间仿射非线性系统的整体强化学习

获取原文
获取原文并翻译 | 示例

摘要

This paper focuses on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input-affine system dynamics. First, we extend the concepts of exploration, integral temporal difference, and invariant admissibility to the target CT nonlinear system that is governed by a control policy plus a probing signal called an exploration. Then, we show input-to-state stability (ISS) and invariant admissibility of the closed-loop systems with the policies generated by integral policy iteration (I-PI) or invariantly admissible PI (IA-PI) method. Based on these, three online I-RL algorithms named explorized I-PI and integral -learning I, II are proposed, all of which generate the same convergent sequences as I-PI and IA-PI under the required excitation condition on the exploration. All the proposed methods are partially or completely model free, and can simultaneously explore the state space in a stable manner during the online learning processes. ISS, invariant admissibility, and convergence properties of the proposed methods are also investigated, and related with these, we show the design principles of the exploration for safe learning. Neural-network-based implementation methods for the proposed schemes are also presented in this paper. Finally, several numerical simulations are carried out to verify the effectiveness of the proposed methods.
机译:本文关注于一类称为积分RL(I-RL)的强化学习(RL)算法,该算法可解决具有仿射系统动力学的连续时间(CT)非线性最优控制问题。首先,我们将探索,积分时差和不变可容许性的概念扩展到目标CT非线性系统,该系统由控制策略以及称为探索的探测信号控制。然后,我们展示了输入-状态稳定性(ISS)和闭环系统的不变容许性,其中该回路具有通过积分策略迭代(I-PI)或不变容许PI(IA-PI)方法生成的策略。在此基础上,提出了三种在线I-RL算法,分别为探索性I-PI和积分学习I,II,在探索所需的激发条件下,它们均生成与I-PI和IA-PI相同的收敛序列。所有提出的方法都是部分或完全免费的,并且可以在在线学习过程中以稳定的方式同时探索状态空间。还研究了ISS,不变接纳性和拟议方法的收敛性,并与之相关,我们展示了安全学习探索的设计原理。本文还提出了基于神经网络的方案。最后,进行了几个数值模拟,以验证所提出方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号