首页> 外文会议>International Joint Conference on Neural Networks >Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming
【24h】

Online learning control based on projected gradient temporal difference and advanced heuristic dynamic programming

机译:基于投影梯度时差和高级启发式动态规划的在线学习控制

获取原文

摘要

We present a novel online learning control algorithm (OLCPA) which comprises projected gradient temporal difference for action-value function (PGTDAVF) and advanced heuristic dynamic programming with one step delay (AHD-POSD). PGTDAVF can guarantee the convergence of temporal difference(TD)-based policy learning with smooth action-value function approximators, such as neural networks. Meanwhile, AHDPOSD is a specially designed framework for embedding PGTDAVF in to conduct online learning control. It not only coincides with the intention of temporal difference but also enables PGTDAVF to be effective under nonidentical policy environment, which results in more practicality. In this way, the proposed algorithms achieve the stability and practicability simultaneously. Finally, simulation of online learning control on a cart pole benchmark demonstrates practical control capability and efficiency of the presented method.
机译:我们提出了一种新颖的在线学习控制算法(OLCPA),其中包括用于动作值函数的投影梯度时差(PGTDAVF)和具有一步延迟的高级启发式动态编程(AHD-POSD)。 PGTDAVF可以使用平滑的动作值函数逼近器(例如神经网络)来保证基于时差(TD)的策略学习的收敛。同时,AHDPOSD是一个专门设计的框架,用于将PGTDAVF嵌入到其中以进行在线学习控制。它不仅与时间差异的意图相吻合,而且使PGTDAVF在不完全一致的政策环境下有效,从而带来了更多的实用性。这样,所提出的算法同时实现了稳定性和实用性。最后,在车杆基准上进行在线学习控制的仿真证明了所提出方法的实际控制能力和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号