首页> 外文期刊>Journal of Process Control >A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system
【24h】

A model-based deep reinforcement learning method applied to finite-horizon optimal control of nonlinear control-affine system

机译:一种基于模型的深度加强学习方法,适用于非线性控制仿射系统的有限范围最优控制

获取原文
获取原文并翻译 | 示例
       

摘要

The Hamilton-Jacobi-Bellman (HJB) equation can be solved to obtain optimal closed-loop control policies for general nonlinear systems. As it is seldom possible to solve the HJB equation exactly for nonlinear systems, either analytically or numerically, methods to build approximate solutions through simulation based learning have been studied in various names like neurodynamic programming (NDP) and approximate dynamic programming (ADP). The aspect of learning connects these methods to reinforcement learning (RL), which also tries to learn optimal decision policies through trial-and-error based learning. This study develops a model-based RL method, which iteratively learns the solution to the HJB and its associated equations. We focus particularly on the control-affine system with a quadratic objective function and the finite horizon optimal control (FHOC) problem with time-varying reference trajectories. The HJB solutions for such systems involve time-varying value, costate, and policy functions subject to boundary conditions. To represent the time-varying HJB solution in high-dimensional state space in a general and efficient way, deep neural networks (DNNs) are employed. It is shown that the use of DNNs, compared to shallow neural networks (SNNs), can significantly improve the performance of a learned policy in the presence of uncertain initial state and state noise. Examples involving a batch chemical reactor and a one-dimensional diffusion-convection-reaction system are used to demonstrate this and other key aspects of the method. (C) 2020 Elsevier Ltd. All rights reserved.
机译:可以解决Hamilton-Jacobi-Bellman(HJB)方程以获得一般非线性系统的最佳闭环控制策略。由于很少能够解决非线性系统的HJB方程,在分析或数值上,已经通过基于模拟的学习来构建近似解的方法,例如神经动力学编程(NDP)和近似动态编程(ADP)。学习的方面将这些方法连接到加强学习(RL),这也试图通过基于试验和错误的学习来学习最佳决策策略。本研究开发了一种基于模型的RL方法,其迭代地将解决方法及其相关方程式的解决方案学习。我们特别关注控制仿射系统,具有二次目标函数和有限地平线最佳控制(FHOC)问题,与时变参考轨迹。这种系统的HJB解决方案涉及越来的值,成本,策略函数受边界条件。为了以一般和有效的方式代表高维状态空间中的时变HJB解决方案,采用深神经网络(DNN)。结果表明,与浅神经网络(SNN)相比,使用DNN(SNNS)可以显着提高在不确定初始状态和状态噪声的存在下学习政策的性能。涉及分批化学反应器和一维扩散 - 对流反应系统的实例用于证明该方法的这种和其他关键方面。 (c)2020 elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号