A new history experience replay design for model-free adaptive dynamic programming

Malla Naresh; Ni Zhen

首页> 外文期刊>Neurocomputing >A new history experience replay design for model-free adaptive dynamic programming

【24h】

A new history experience replay design for model-free adaptive dynamic programming

机译：用于无模型自适应动态编程的新历史记录重播设计

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

An adaptive dynamic programming (ADP) controller is a powerful control technique that has been investigated, designed and tested in a wide range of applications for solving optimal control problems in complex systems. The performance of the ADP controller is usually obtained by long training periods because the data usage efficiency is low as it discards the samples once used. History experience, also known as experience replay, is a powerful technique showing potential to accelerate the training process of learning and control. However, the existing design of history experience cannot be directly used for the model free ADP design, because the existing work focuses on the forward temporal difference (TD) information (e.g., state-action pair). This information is between the current time step and the future time step and will need a model network for future information prediction. This paper proposes a new history experience replay design to avoid the usage of the model network or identifier of the system environment. Specifically, we designed the experience tuple with one step backward state-action information and the TD can be achieved by a previous time step and a current time step. In addition, a systematic approach is proposed to integrate history experience in both the critic and action networks of the ADP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial starting states and initial weight parameters for both approaches under the same simulation environment. The statistical results show that the proposed approach can improve the required average number of trials to succeed as well as the success rate. In general, the proposed approach improved the required average trial to succeed by 26.5% for cart-pole and 43% for triple-link balancing tasks. (C) 2017 Elsevier B.V. All rights reserved.

机译：自适应动态编程（ADP）控制器是一种强大的控制技术，已经在广泛的应用中进行了研究，设计和测试，以解决复杂系统中的最佳控制问题。 ADP控制器的性能通常是通过较长的训练时间来获得的，因为数据使用效率很低，因为它会丢弃使用过的样本。历史经验（也称为体验重播）是一种强大的技术，具有加速学习和控制训练过程的潜力。但是，历史经验的现有设计不能直接用于无模型的ADP设计，因为现有工作集中在前向时间差（TD）信息（例如状态-动作对）上。该信息在当前时间步长和未来时间步长之间，并且需要模型网络来进行将来的信息预测。本文提出了一种新的历史体验重播设计，以避免使用模型网络或系统环境的标识符。具体来说，我们设计了具有一个后退状态动作信息的体验元组，并且可以通过上一个时间步长和当前时间步长来实现TD。此外，提出了一种系统的方法来将历史经验集成到ADP控制器设计的评论家和动作网络中。所提出的方法已通过两个案例研究进行了测试：车杆平衡任务和三连杆摆平衡任务。为了公平比较，我们在相同的模拟环境下为这两种方法设置了相同的初始起始状态和初始权重参数。统计结果表明，该方法可以提高成功所需的平均试验次数和成功率。总体而言，所提出的方法将所需的平均试验次数提高了，对于手推车杆成功了26.5％，对于三重链接平衡任务成功了43％。（C）2017 Elsevier B.V.保留所有权利。

著录项

来源
《Neurocomputing》 |2017年第29期|141-149|共9页
作者
Malla Naresh; Ni Zhen;
展开▼
作者单位

South Dakota State Univ, Dept Elect Engn & Comp Sci, Brookings, SD 57007 USA;

South Dakota State Univ, Dept Elect Engn & Comp Sci, Brookings, SD 57007 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Adaptive dynamic programming (ADP); Model-free design; History experience; Reinforcement learning (RL); Optimal control;

机译：自适应动态规划（ADP）;无模型设计;历史经验;强化学习（RL）;最优控制;

相似文献

外文文献
中文文献
专利

1. Adaptive cruise control via adaptive dynamic programming with experience replay [J] . Wang Bin, Zhao Dongbin, Cheng Jin Soft computing: A fusion of foundations, methodologies and applications . 2019,第12期

机译：通过具有体验重放的自适应动态编程自适应巡航控制
2. Model-free optimal control design for a class of linear discrete-time systems with multiple delays using adaptive dynamic programming [J] . Jilie Zhang, Huaguang Zhang, Yanhong Luo, Neurocomputing . 2014,第jula5期

机译：使用自适应动态规划的一类多延迟线性离散时间系统的无模型最优控制设计
3. Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems [J] . Ni Zhen, Malla Naresh, Zhong Xiangnan Cybernetics, IEEE Transactions on . 2019,第11期

机译：优先考虑基于启发式动态编程的学习系统的有用经验重播
4. A general adaptive dynamic programming approach with experience replay [C] . Bin Wang, Dongbin Zhao, Jin Cheng, International Joint Conference on Neural Networks . 2016

机译：具有经验回放的通用自适应动态编程方法
5. Intelligent Learning Control System Design Based on Adaptive Dynamic Programming [D] . Malla, Naresh. 2017

机译：基于自适应动态规划的智能学习控制系统设计
6. Preliminary Design of a Model-Free Synthetic Sensor for Aerodynamic Angle Estimation for Commercial Aviation [O] . Angelo Lerro, Alberto Brandl, Manuela Battipede, 2019

机译：商用航空气动力角估计的无模型合成传感器的初步设计
7. Model-Free Composite Control of Flexible Manipulators Based on Adaptive Dynamic Programming [O] . Chunyu Yang, Yiming Xu, Linna Zhou, 2018

机译：基于自适应动态规划的柔性机械手无模型复合控制

A new history experience replay design for model-free adaptive dynamic programming

摘要

著录项

相似文献

相关主题

期刊订阅