首页> 外文会议>Chinese Control and Decision Conference >Least-Squares Temporal Difference Learning with Eligibility Traces based on Regularized Extreme Learning Machine
【24h】

Least-Squares Temporal Difference Learning with Eligibility Traces based on Regularized Extreme Learning Machine

机译:基于正规化的极端学习机的资格迹线,最小二乘时间差异学习

获取原文

摘要

The task of learning the value function under a fixed policy in continuous Markov decision processes (MDPs) is considered. Although ELM has fast learning speed and can avoid tuning issues of traditional artificial neural network (ANN), the randomness of the ELM parameters would result in fluctuating performance. In this paper, a least-squares temporal difference algorithm with eligibility traces based on regularized extreme learning machine (RELM-LSTD(λ)) is proposed to overcome these problems caused by ELM in Reinforcement Learning problem. The proposed algorithm combined the LSTD(λ) algorithm with RELM. The RELM is used to approximate value functions. Furthermore, the eligibility trace term is introduced to increase data efficiency. In experiments, the performances of the proposed algorithm are demonstrated and compared with those of LSTD and ELM-LSTD. Experiment results show that the proposed algorithm can achieve a more stable and better performance in approximating the value function under a fixed policy.
机译:学习下连续马尔可夫决策过程(MDP中)固定策略的值函数的任务是考虑。虽然ELM具有学习速度快,可避免传统人工神经网络(ANN)的调整问题,榆树参数的随机性会导致波动的表现。在本文中,基于正则极端学习机的资格痕迹最小二乘时间差算法(RELM-LSTD(λ)),提出了克服强化学习问题引起的ELM这些问题。所提出的算法结合RELM的LSTD(λ)算法。所述RELM用于近似值的函数。此外,合格跟踪期间被引入到提高数据效率。在实验中,该算法的性能进行了论证,并与LSTD和ELM-LSTD相比。实验结果表明,该算法可以在一个固定的政策逼近值函数实现更稳定和更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号