A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward

Liang Mingming; Wei Qinglai

首页> 外文期刊>Neurocomputing >A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward

【24h】

A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward

机译：非线性神经最优控制的部分政策迭代ADP算法，折扣总奖励

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

This paper constructs a partial policy iteration adaptive dynamic programming (ADP) algorithm to solve the optimal control problem of nonlinear systems with discounted total reward. Compared with traditional policy iteration ADP algorithm, the approach updates the iterative control law only in a local region of the global system state space. With the benefit of this feature, the overall computational burden at each iteration for processing units can be significantly reduced. Hence, this feature enables our algorithm to be successfully executed on low-performance devices such as smartphones, smartwatches and the Internet of Things (IoT) objects. We provide the convergency analysis to show that the generated sequence of value functions is monotonically nonincreasing and can finally reach a local optimum. In addition, the corresponding local policy space is developed theoretically for the first time. Besides, when the sequence of the local system state spaces is chosen properly, we prove that the developed algorithm is capable of finding the global optimal performance index function for the nonlinear systems. Finally, we present a numerical simulation to demonstrate the effectiveness of the proposed algorithm. (c) 2020 Elsevier B.V. All rights reserved.

机译：本文构建了部分政策迭代自适应动态编程（ADP）算法，解决了折扣总奖励的非线性系统的最佳控制问题。与传统政策迭代ADP算法相比，该方法仅在全局系统状态空间的本地区域更新迭代控制法。凭借此功能的好处，可以显着降低处理单元的每次迭代的整体计算负担。因此，此功能使我们的算法能够在低性能设备上成功执行，例如智能手机，SmartWatches和Internet（IoT）对象。我们提供了收敛分析，以表明所生成的价值函数序列是单调的，并且最终可以达到局部最佳。此外，在理论上首次开发相应的本地政策空间。此外，当正确选择本地系统状态空间的序列时，我们证明了开发的算法能够找到非线性系统的全局最佳性能指标功能。最后，我们提出了一个数值模拟，以证明所提出的算法的有效性。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Neurocomputing》 |2021年第1期|23-34|共12页
作者
Liang Mingming; Wei Qinglai;
展开▼
作者单位

Guangdong Univ Technol Sch Automat Guangzhou 510006 Peoples R China|Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China;

Chinese Acad Sci Inst Automat State Key Lab Management & Control Complex Syst Beijing 100190 Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Adaptive critic designs; Adaptive dynamic programming; Policy iteration; Neural networks; Neuro-dynamic programming; Nonlinear systems; Optimal control;

机译：自适应批评设计;自适应动态规划;政策迭代;神经网络;神经动力学编程;非线性系统;最优控制;

A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward

摘要

著录项

引文网络

相关主题

期刊订阅