Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems

Luo Biao; Liu Derong; Huang Tingwen; Yang Xiong; Ma Hongwen

首页> 外文期刊>Information Sciences: An International Journal >Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems

【24h】

Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems

机译：非线性离散时间系统最优控制的多步启发式动态规划

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Policy iteration and value iteration are two main iterative adaptive dynamic programming frameworks for solving optimal control problems. Policy iteration converges fast while requiring an initial stabilizing control policy, which is a strict constraint in practice. Value iteration avoids the requirement of initial admissible control policy while converging much slowly. This paper tries to utilize the advantages of policy iteration and value iteration, and avoids their drawbacks at the same time. Therefore, a multi-step heuristic dynamic programming (MsHDP) method is developed for solving the optimal control problem of nonlinear discrete-time systems. MsHDP speeds up value iteration and avoids the requirement of initial admissible control policy in policy iteration at the same time. The convergence theory of MsHDP is established by proving that it converges to the solution of the Bellman equation. For implementation purpose, the actor-critic neural network (NN) structure is developed. The critic NN is employed to estimate the value function and its NN weight vector is computed with a least-square scheme. The actor NN is used to estimate the control policy and a gradient descent method is proposed for updating its NN weight vector. According to the comparative simulation studies on two examples, the effectiveness and advantages of MsHDP are verified. (C) 2017 Elsevier Inc. All rights reserved.

机译：政策迭代和价值迭代是两个主要迭代自适应动态编程框架，用于解决最佳控制问题。政策迭代在需要初始稳定控制策略的同时收敛，这是一个严格的实践约束。价值迭代避免了初始允许控制策略的要求，同时汇总缓慢。本文试图利用政策迭代和价值迭代的优势，并避免同时缺点。因此，开发了一种用于解决非线性离散时间系统的最优控制问题的多步启发式动态编程（MSHDP）方法。 MSHDP加快了价值迭代，并避免了同时在政策迭代中的初始允许控制策略的要求。通过证明它会聚到Bellman方程的解决方案，建立了MSHDP的收敛理论。为了实现目的，开发了演员 - 评论家神经网络（NN）结构。用于估计批评者NN来估计值函数，并且其NN重量向量用最小二乘方案计算。 Actor Nn用于估计控制策略和梯度下降方法，用于更新其NN权重向量。根据两个实例的比较模拟研究，验证了MSHDP的有效性和优点。（c）2017年Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal》 |2017年第2017期|共18页
作者
Luo Biao; Liu Derong; Huang Tingwen; Yang Xiong; Ma Hongwen;
展开▼
作者单位

Chinese Acad Sci Inst Automat State Key Lab Management &

Control Complex Syst Beijing 100190 Peoples R China;

Guangdong Univ Technol Sch Automat Guangzhou 510006 Guangdong Peoples R China;

Texas A&

M Univ Qatar POB 23874 Doha Qatar;

Tianjin Univ Sch Elect &

Informat Engn Tianjin 300072 Peoples R China;

Chinese Acad Sci Inst Automat State Key Lab Management &

Control Complex Syst Beijing 100190 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;计算机的应用;信息与知识传播;自动化技术、计算机技术;
关键词
Optimal control; Multi-step heuristic dynamic programming; Adaptive dynamic programming; Nonlinear systems; Discrete-time; Neural networks;

机译：最优控制;多步启发式动态规划;自适应动态规划;非线性系统;离散时间;神经网络;

相似文献

外文文献
中文文献
专利

1. Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems [J] . Luo Biao, Liu Derong, Huang Tingwen, Information Sciences: An International Journal . 2017,第期

机译：非线性离散时间系统最优控制的多步启发式动态规划
2. Approximately Optimal Control of Discrete-Time Nonlinear Switched Systems Using Globalized Dual Heuristic Programming [J] . Chaoxu Mu, Kaiju Liao, Ling Ren, Neural processing letters . 2020,第2期

机译：使用全球化双启发式编程对离散时间非线性交换系统的近似优化控制
3. Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming [J] . Liu D., Wang D., Zhao D., Automation Science and Engineering, IEEE Transactions on . 2012,第3期

机译：基于神经网络的一类未知离散非线性系统的全局最优启发式控制
4. rhe Optimal Control of Discrete-Time Delay Nonlinear System with Dual Heuristic Dynamic Programming [C] . Bin Wang, Dongbin Zhao International conference on neural information processing . 2012

机译：双启发式动态规划的离散时滞非线性系统的最优控制
5. Neural Network-based epsilon-Adaptive Dynamic Programming and epsilon-Optimal Control for Nonlinear Systems. [D] . Jin, Ning. 2011

机译：非线性系统的基于神经网络的epsilon自适应动态规划和epsilon最优控制。
6. Fuzzy ... formula ... output-feedback control for the discrete-time system with channel fadings sector nonlinearities and randomly occurring interval delays and nonlinearities [O] . Xiaozheng Fan, Yan Wang, Manfeng Hu -1

机译：具有信道衰落扇区非线性以及随机出现的间隔延迟和非线性的离散时间系统的模糊...公式输出反馈控制
7. An Optimal Control Scheme for a Class of Discrete-time Nonlinear Systems with Time Delays Using Adaptive Dynamic Programming [O] . Qing-Lai WEI, Hua-Guang ZHANG, De-Rong LIU, 2010

机译：使用自适应动态规划具有时间延迟的一类离散时间非线性系统的最优控制方案
8. General Results in Optimal Control of Discrete-Time Nonlinear Stochastic Systems [R] . Ciancetta, M. S. 1988

机译：离散非线性随机系统最优控制的一般结果

Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems

摘要

著录项

相似文献

相关主题

期刊订阅