基于改进ELM的递归最小二乘时序差分强化学习算法及其应用

徐圆; 黄兵明; 贺彦林

首页> 中文期刊> 《化工学报》 >基于改进ELM的递归最小二乘时序差分强化学习算法及其应用

基于改进ELM的递归最小二乘时序差分强化学习算法及其应用

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

To meet the requirements on accuracy and computational time of value approximation algorithms, a recursive least-squares temporal difference reinforcement learning algorithm with eligibility traces based on improved extreme learning machine (RLSTD(λ)-IELM) was proposed. First, a recursive least-squares temporal difference reinforcement learning (RLSTD) was created by introducing recursive method into least-squares temporal difference reinforcement learning algorithm (LSTD), in order to eliminate matrix inversion process in least-squares algorithm and to reduce complexity and computation of the proposed algorithm. Then, eligibility trace was introduced into RLSTD algorithm to form the recursive least-squares temporal difference reinforcement learning algorithm with eligibility trace (RLSTD(λ)), in order to solve issues of slow convergence speed of LSTD(0) and low efficiency of experience exploitation. Furthermore, since value function in most reinforcement learning problem was monotonic, a single suppressed approximation Softplus function was used to replace sigmoid activation function in the extreme learning machine network in order to reduce computation load and improve computing speed. The experiment result on generalized Hop-world problem demonstrated that the proposed algorithm RLSTD(λ)-IELM had faster computing speed than the least-squares temporal difference learning algorithm based on extreme learning machine (LSTD-ELM), and better accuracy than the least-squares temporal difference learning algorithm based on radial basis functions (LSTD-RBF).%针对值函数逼近算法对精度及计算时间等要求,提出了一种基于改进极限学习机的递归最小二乘时序差分强化学习算法.首先,将递推方法引入到最小二乘时序差分强化学习算法中消去最小二乘中的矩阵求逆过程,形成递推最小二乘时序差分强化学习算法,减少算法的复杂度及其计算量.其次,考虑到LSTD(0)算法收敛速度慢,加入资格迹增加样本利用率提高收敛速度的算法,形成LSTD(λ)算法,以保证在经历过相同数量的轨迹后能收敛于真实值.同时,考虑到大部分强化学习问题的值函数是单调的,而传统ELM方法通常运用具有双侧抑制特性的Sigmoid激活函数,增大了计算成本,提出采用具有单侧抑制特性的Softplus激活函数代替传统Sigmoid函数,以减少计算量提高运算速度,使得该算法在提高精度的同时提高了计算速度.通过与传统基于径向基函数的最小二乘强化学习算法和基于极限学习机的最小二乘TD算法在广义Hop-world问题的对比实验,比较结果证明了所提出算法在满足精度的条件下有效提高了计算速度,甚至某些条件下精度比其他两种算法更高.

著录项

来源
《化工学报》 |2017年第3期|916-924|共9页
作者
徐圆; 黄兵明; 贺彦林;
展开▼
作者单位

北京化工大学信息科学与技术学院,北京 100029;

北京化工大学信息科学与技术学院,北京 100029;

北京化工大学信息科学与技术学院,北京 100029;

展开▼
原文格式 PDF
正文语种 chi
中图分类自动化技术在各方面的应用;
关键词
强化学习; 激活函数; 递归最小二乘算法; 函数逼近; 广义Hop-world问题;

相似文献

中文文献
外文文献
专利

1. 一种基于递归最小二乘法的强化学习算法及其应用研究 [J] . 沈智鹏 ,郭晨 . 计算机工程与应用 . 2005,第008期
2. 基于改进有限差分算法ELM模型的径流模拟 [J] . 李选彧 . 水土保持应用技术 . 2021,第003期
3. 一种改进变步长ELMS算法及其在自适应消噪中的应用 [J] . 黄石 ,吕振肃 . 甘肃科学学报 . 2005,第003期
4. 基于改进微粒群算法的Elman网络在非线性动态系统辨识中的应用 [J] . 岳颀 ,孙佳 ,王新 . 电脑知识与技术 . 2009,第027期
5. 基于改进BP算法的Elman网络在软基沉降预测中的应用 [J] . 陈述存 ,高正夏 . 工程地质学报 . 2006,第003期
6. 基于时序差分算法的线损异常判别优化研究 [C] . WEI Xing-qiu ,韦杏秋 ,CHEN Jun . 2016年中国电机工程学会年会 . 2016
7. 基于改进ELM的递归最小二乘强化学习算法的研究 [A] . 黄兵明 . 2017

基于改进ELM的递归最小二乘时序差分强化学习算法及其应用

摘要

著录项

相似文献

相关主题

期刊订阅