首页> 外文期刊>Knowledge-Based Systems >An efficient L2-norm regularized least-squares temporal difference learning algorithm
【24h】

An efficient L2-norm regularized least-squares temporal difference learning algorithm

机译:一种有效的L2范数正则化最小二乘时差学习算法

获取原文
获取原文并翻译 | 示例
           

摘要

In reinforcement learning, when samples are limited in some real applications, Least-Squares Temporal Difference (LSTD) learning is prone to over-fitting, which can be overcome by the introduction of regu-larization. However, the solution of LSTD with regularization still depends on costly matrix inversion operations. In this paper we investigate the L2-norm regularized LSTD learning and propose an efficient algorithm to avoid expensive computational cost. We derive LSTD using Bellman operator along with projection operator. The L2-norm penalty is introduced to avoid over-fitting. We also describe the difference between Bellman residual minimization and LSTD. Then we propose an efficient recursive least-squares algorithm for L2-norm regularized LSTD, which can eliminate matrix inversion operations and decrease computational complexity effectively. We present empirical comparisons on the Boyan chain problem. The results show that the performance of the new algorithm is better than that of regularized LSTD.
机译:在强化学习中,当样本在某些实际应用中受到限制时,最小二乘时差(LSTD)学习容易过度拟合,可以通过引入规则化来克服。但是,具有正则化的LSTD解决方案仍然取决于昂贵的矩阵求逆运算。在本文中,我们研究了L2范数正则化LSTD学习,并提出了一种有效的算法来避免昂贵的计算成本。我们使用Bellman运算符和投影运算符得出LSTD。引入了L2范数惩罚以避免过度拟合。我们还描述了Bellman残差最小化和LSTD之间的差异。然后,针对L2-范数正则化LSTD提出了一种有效的递归最小二乘算法,该算法可以消除矩阵求逆运算,并有效降低计算复杂度。我们对博扬链问题进行实证比较。结果表明,新算法的性能优于正规化的LSTD。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号