首页> 中文期刊> 《化工学报》 >基于改进ELM的递归最小二乘时序差分强化学习算法及其应用

基于改进ELM的递归最小二乘时序差分强化学习算法及其应用

         

摘要

To meet the requirements on accuracy and computational time of value approximation algorithms, a recursive least-squares temporal difference reinforcement learning algorithm with eligibility traces based on improved extreme learning machine (RLSTD(λ)-IELM) was proposed. First, a recursive least-squares temporal difference reinforcement learning (RLSTD) was created by introducing recursive method into least-squares temporal difference reinforcement learning algorithm (LSTD), in order to eliminate matrix inversion process in least-squares algorithm and to reduce complexity and computation of the proposed algorithm. Then, eligibility trace was introduced into RLSTD algorithm to form the recursive least-squares temporal difference reinforcement learning algorithm with eligibility trace (RLSTD(λ)), in order to solve issues of slow convergence speed of LSTD(0) and low efficiency of experience exploitation. Furthermore, since value function in most reinforcement learning problem was monotonic, a single suppressed approximation Softplus function was used to replace sigmoid activation function in the extreme learning machine network in order to reduce computation load and improve computing speed. The experiment result on generalized Hop-world problem demonstrated that the proposed algorithm RLSTD(λ)-IELM had faster computing speed than the least-squares temporal difference learning algorithm based on extreme learning machine (LSTD-ELM), and better accuracy than the least-squares temporal difference learning algorithm based on radial basis functions (LSTD-RBF).%针对值函数逼近算法对精度及计算时间等要求,提出了一种基于改进极限学习机的递归最小二乘时序差分强化学习算法.首先,将递推方法引入到最小二乘时序差分强化学习算法中消去最小二乘中的矩阵求逆过程,形成递推最小二乘时序差分强化学习算法,减少算法的复杂度及其计算量.其次,考虑到LSTD(0)算法收敛速度慢,加入资格迹增加样本利用率提高收敛速度的算法,形成LSTD(λ)算法,以保证在经历过相同数量的轨迹后能收敛于真实值.同时,考虑到大部分强化学习问题的值函数是单调的,而传统ELM方法通常运用具有双侧抑制特性的Sigmoid激活函数,增大了计算成本,提出采用具有单侧抑制特性的Softplus激活函数代替传统Sigmoid函数,以减少计算量提高运算速度,使得该算法在提高精度的同时提高了计算速度.通过与传统基于径向基函数的最小二乘强化学习算法和基于极限学习机的最小二乘TD算法在广义Hop-world问题的对比实验,比较结果证明了所提出算法在满足精度的条件下有效提高了计算速度,甚至某些条件下精度比其他两种算法更高.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号