首页> 外文学位 >Robust adaptive dynamic programming for continuous-time linear and nonlinear systems.
【24h】

Robust adaptive dynamic programming for continuous-time linear and nonlinear systems.

机译:连续时间线性和非线性系统的鲁棒自适应动态规划。

获取原文
获取原文并翻译 | 示例

摘要

The field of adaptive dynamic programming and its applications to control engineering problems has undergone rapid progress over the past few years. Recently, a new theory called Robust Adaptive Dynamic Programming (for short, RADP) has been developed for the design of robust optimal controllers for linear and nonlinear systems subject to both parametric and dynamic uncertainties. This dissertation integrates our recent contributions to the development of the theory of RADP and illustrates its potential applications in both engineering and biological systems.;In order to develop the RADP framework, our attention is first focused on the development of an ADP-based online learning method for continuous-time (CT) linear systems with completely unknown system. This problem is challenging due to the different structures between CT and discrete-time (DT) algebraic Riccati equations (AREs), and therefore methods developed for DT ADP cannot be directly applied in the CT setting. This obstacle is overcome in our work by taking advantages of exploration noise. The methodology is immediately extended to deal with CT affine nonlinear systems, via neural-networks-based approximation of the Hamilton-Jacobi-Bellman (HJB) equation, of which the solution is extremely difficult to be obtained analytically. To achieve global stabilization, for the first time we propose an idea of global ADP (or GADP), in which we relax the problem of solving the Hamilton-Jacobi-Bellman (HJB) equation to an optimization problem, of which a suboptimal solution is obtained via a sum-of-squares-program-based policy iteration method. The resultant control policy is globally stabilizing, instead of semi-globally or locally stabilizing.;Then, we develop RADP aimed at computing globally stabilizing and suboptimal control policies in the presence of dynamic uncertainties. A key strategy is to integrate ADP theory with techniques in modern nonlinear control with a unique objective of filling a gap in the past literature of ADP without taking into account dynamic uncertainties. The development of this framework contains two major steps. First, we study an RADP method for partially linear systems (i.e., linear systems with nonlinear dynamic uncertainties) and weakly nonlinear large-scale systems. Global stabilization of the systems can be achieved by selecting performance indices with appropriate weights for the nominal system. Second, we extend the RADP framework for affine nonlinear systems with nonlinear dynamic uncertainties. To achieve robust stabilization, we resort to tools from nonlinear control theory, such as gain assignment and the ISS nonlinear small-gain theorem.;From the perspective of RADP, we derive a novel computational mechanism for sensorimotor control. Sharing some essential features of reinforcement learning, which was originally observed from mammals, the RADP model for sensorimotor control suggests that, instead of identifying the system dynamics of both the motor system and the environment, the central nervous system (CNS) computes iteratively a robust optimal control policy using the real-time sensory data. By comparing our numerical results with experimentally observed data, we show that the proposed model can reproduce movement trajectories which are consistent with experimental observations. In addition, the RADP theory provides a unified framework that connects optimality and robustness properties in the sensorimotor system. Therefore, we argue that the CNS may use RADP-like learning strategies to coordinate movements and to achieve successful adaptation in the presence of static and/or dynamic uncertainties.
机译:在过去的几年中,自适应动态规划及其在控制工程问题中的应用领域取得了飞速发展。最近,已经开发了一种称为鲁棒自适应动态规划(简称RADP)的新理论,用于设计受参数和动态不确定性影响的线性和非线性系统的鲁棒最优控制器。本文综合了我们对RADP理论发展的最新贡献,并阐述了其在工程和生物系统中的潜在应用。为了开发RADP框架,我们首先关注基于ADP的在线学习的发展。完全未知系统的连续时间(CT)线性系统的方法。由于CT和离散时间(DT)代数Riccati方程(ARE)之间的结构不同,因此此问题具有挑战性,因此为DT ADP开发的方法无法直接应用于CT设置中。利用勘探噪声可以克服这一障碍。通过基于神经网络的汉密尔顿-雅各比-贝尔曼(HJB)方程近似,该方法立即扩展为处理CT仿射非线性系统,其解决方案极难通过解析获得。为了实现全局稳定,我们首次提出了全局ADP(或GADP)的概念,其中,我们将求解Hamilton-Jacobi-Bellman(HJB)方程的问题简化为一个优化问题,该问题的次优解决方案是通过基于平方和程序的策略迭代方法获得。最终的控制策略是全局稳定的,而不是半全局或局部稳定的;然后,我们开发了RADP,旨在在存在动态不确定性的情况下计算全局稳定和次优控制策略。一项关键策略是将ADP理论与现代非线性控制技术相结合,其独特目标是填补ADP过去文献中的空白,而不考虑动态不确定性。该框架的开发包含两个主要步骤。首先,我们研究了部分线性系统(即具有非线性动态不确定性的线性系统)和弱非线性大系统的RADP方法。通过为标称系统选择具有适当权重的性能指标,可以实现系统的全局稳定。其次,我们扩展了具有非线性动态不确定性的仿射非线性系统的RADP框架。为了实现鲁棒的稳定,我们采用非线性控制理论的工具,例如增益分配和ISS非线性小增益定理。;从RADP的角度出发,我们推导了一种新型的感觉运动控制计算机制。与最初从哺乳动物中观察到的增强学习的一些基本特征相同,RADP感觉运动控制模型表明,中枢神经系统(CNS)可以迭代地计算出健壮性,而不是识别运动系统和环境的系统动力学使用实时感官数据的最佳控制策略。通过将我们的数值结果与实验观察到的数据进行比较,我们表明所提出的模型可以重现与实验观察结果一致的运动轨迹。另外,RADP理论提供了一个统一的框架,该框架连接了感觉运动系统中的最优性和鲁棒性。因此,我们认为中枢神经系统可能使用类似于RADP的学习策略来协调运动并在存在静态和/或动态不确定性的情况下成功实现适应。

著录项

  • 作者

    Jiang, Yu.;

  • 作者单位

    Polytechnic Institute of New York University.;

  • 授予单位 Polytechnic Institute of New York University.;
  • 学科 Engineering Electronics and Electrical.;Engineering Computer.;Engineering General.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 257 p.
  • 总页数 257
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号