首页> 外文期刊>Neurocomputing >A new history experience replay design for model-free adaptive dynamic programming
【24h】

A new history experience replay design for model-free adaptive dynamic programming

机译:用于无模型自适应动态编程的新历史记录重播设计

获取原文
获取原文并翻译 | 示例
           

摘要

An adaptive dynamic programming (ADP) controller is a powerful control technique that has been investigated, designed and tested in a wide range of applications for solving optimal control problems in complex systems. The performance of the ADP controller is usually obtained by long training periods because the data usage efficiency is low as it discards the samples once used. History experience, also known as experience replay, is a powerful technique showing potential to accelerate the training process of learning and control. However, the existing design of history experience cannot be directly used for the model free ADP design, because the existing work focuses on the forward temporal difference (TD) information (e.g., state-action pair). This information is between the current time step and the future time step and will need a model network for future information prediction. This paper proposes a new history experience replay design to avoid the usage of the model network or identifier of the system environment. Specifically, we designed the experience tuple with one step backward state-action information and the TD can be achieved by a previous time step and a current time step. In addition, a systematic approach is proposed to integrate history experience in both the critic and action networks of the ADP controller design. The proposed approach is tested for two case studies: a cart-pole balancing task and a triple-link pendulum balancing task. For fair comparison, we set the same initial starting states and initial weight parameters for both approaches under the same simulation environment. The statistical results show that the proposed approach can improve the required average number of trials to succeed as well as the success rate. In general, the proposed approach improved the required average trial to succeed by 26.5% for cart-pole and 43% for triple-link balancing tasks. (C) 2017 Elsevier B.V. All rights reserved.
机译:自适应动态编程(ADP)控制器是一种强大的控制技术,已经在广泛的应用中进行了研究,设计和测试,以解决复杂系统中的最佳控制问题。 ADP控制器的性能通常是通过较长的训练时间来获得的,因为数据使用效率很低,因为它会丢弃使用过的样本。历史经验(也称为体验重播)是一种强大的技术,具有加速学习和控制训练过程的潜力。但是,历史经验的现有设计不能直接用于无模型的ADP设计,因为现有工作集中在前向时间差(TD)信息(例如状态-动作对)上。该信息在当前时间步长和未来时间步长之间,并且需要模型网络来进行将来的信息预测。本文提出了一种新的历史体验重播设计,以避免使用模型网络或系统环境的标识符。具体来说,我们设计了具有一个后退状态动作信息的体验元组,并且可以通过上一个时间步长和当前时间步长来实现TD。此外,提出了一种系统的方法来将历史经验集成到ADP控制器设计的评论家和动作网络中。所提出的方法已通过两个案例研究进行了测试:车杆平衡任务和三连杆摆平衡任务。为了公平比较,我们在相同的模拟环境下为这两种方法设置了相同的初始起始状态和初始权重参数。统计结果表明,该方法可以提高成功所需的平均试验次数和成功率。总体而言,所提出的方法将所需的平均试验次数提高了,对于手推车杆成功了26.5%,对于三重链接平衡任务成功了43%。 (C)2017 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号