Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems

Ni Zhen; Malla Naresh; Zhong Xiangnan

首页> 外文期刊>Cybernetics, IEEE Transactions on >Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems

【24h】

Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems

机译：优先考虑基于启发式动态编程的学习系统的有用经验重播

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The adaptive dynamic programming controller usually needs a long training period because the data usage efficiency is relatively low by discarding the samples once used. Prioritized experience replay (ER) promotes important experiences and is more efficient in learning the control process. This paper proposes integrating an efficient learning capability of prioritized ER design into heuristic dynamic programming (HDP). First, a one time-step backward state-action pair is used to design the ER tuple and, thus, avoids the model network. Second, a systematic approach is proposed to integrate the ER into both critic and action networks of HDP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial weight parameters and initial starting states for both traditional HDP and the proposed approach under the same simulation environment. The proposed approach improves the required average number of trials to succeed by 60.56% for cart-pole, and 56.89% for triple-link balancing tasks, in comparison with the traditional HDP approach. Also, we have added results of ER-based HDP for comparison. Moreover, theoretical convergence analysis is presented to guarantee the stability of the proposed control design.

机译：自适应动态编程控制器通常需要较长的训练周期，因为通过丢弃曾经使用过的样本，数据使用效率相对较低。优先体验重播（ER）可以促进重要体验，并且在学习控制过程中效率更高。本文提出将优先级ER设计的有效学习能力集成到启发式动态编程（HDP）中。首先，使用一个时步向后的状态-动作对来设计ER元组，从而避免了模型网络。其次，提出了一种系统的方法将ER集成到HDP控制器设计的评论家和动作网络中。所提出的方法已通过两个案例研究进行了测试：车杆平衡任务和三连杆摆平衡任务。为了公平地比较，我们在相同的模拟环境下为传统的HDP和建议的方法设置了相同的初始权重参数和初始起始状态。与传统的HDP方法相比，拟议的方法将成功完成测试所需的平均试验次数提高了60.56％（三极杆平衡）和56.89％（三杆平衡任务）。此外，我们还添加了基于ER的HDP的结果以进行比较。此外，提出了理论收敛分析以保证所提出的控制设计的稳定性。

著录项

来源
《Cybernetics, IEEE Transactions on》 |2019年第11期|3911-3922|共12页
作者
Ni Zhen; Malla Naresh; Zhong Xiangnan;
展开▼
作者单位

South Dakota State Univ Elect Engn & Comp Sci Dept Brookings SD 57007 USA;

Univ North Texas Elect Engn Dept Denton TX 76203 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Adaptive dynamic programming (ADP); experience replay (ER); heuristic dynamic programming (HDP); intelligent system; neural networks (NNs); online learning-based controller; prioritized sampling;

机译：自适应动态编程（ADP）;体验重播（ER）;启发式动态规划（HDP）;智能系统;神经网络（NNs）;基于在线学习的控制器;优先抽样;

相似文献

外文文献
中文文献
专利

1. Heuristic dynamic programming-based learning control for discrete-time disturbed multi-agent systems [J] . Yao Zhang, Chaoxu Mu, Yong Zhang, 控制理论与应用（英文版） . 2021,第003期

机译：基于机启发式动态编程的离散时间受扰动的多助理系统的学习控制
2. Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge [J] . Lan Jiang, Hongyun Huang, Zuohua Ding 自动化学报：英文版 . 2020,第004期

机译：基于深度Q学习，经验回放和启发式知识的智能机器人路径规划
3. Path Planning for Intelligent Robots Based on Deep Q-learning With Experience Replay and Heuristic Knowledge [J] . Lan Jiang, Hongyun Huang, Zuohua Ding 自动化学报（英文版） . 2020,第004期

机译：基于Deep Q-Learning的经验重播和启发式知识的智能机器人路径规划
4. High-Value Prioritized Experience Replay for Off-Policy Reinforcement Learning [C] . Xi Cao, Huaiyu Wan, Youfang Lin, IEEE International Conference on Tools with Artificial Intelligence . 2019

机译：高价值优先体验重播，用于非政策强化学习
5. An exploratory study of the lived experience and contributing factors to blending stepfamilies --- A dynamic systems and transformational learning theories approach. [D] . Feller, Kelly Carl. 2011

机译：探索性研究生活经验和混合步骤家庭的贡献因素---动态系统和变革性学习理论方法。
6. Cognitive Replay of Visuomotor Learning at Sleep Onset: Temporal Dynamics and Relationship to Task Performance [O] . Erin J. Wamsley, Karen Perry, Ina Djonlagic, 2010

机译：睡眠开始时视觉运动学习的认知重现：时间动态及其与任务绩效的关系
7. Prioritized experience replay for learning in the rat hippocampus [O] . Hideyoshi Igata, Yuji Ikegaya, Takuya Sasaki 2020

机译：优先考虑在大鼠海马学习的经验重放
8. Enhanced Experience Replay for Deep Reinforcement Learning. [R] . Doria, D., Dawson, B., Vindiola, M. 2015

机译：增强深度强化学习的体验重播。

Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅