An efficient L2-norm regularized least-squares temporal difference learning algorithm

Shenglei Chen; Geng Chen; Ruijun Gu

首页> 外文期刊>Knowledge-Based Systems >An efficient L2-norm regularized least-squares temporal difference learning algorithm

【24h】

An efficient L2-norm regularized least-squares temporal difference learning algorithm

机译：一种有效的L2范数正则化最小二乘时差学习算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In reinforcement learning, when samples are limited in some real applications, Least-Squares Temporal Difference (LSTD) learning is prone to over-fitting, which can be overcome by the introduction of regu-larization. However, the solution of LSTD with regularization still depends on costly matrix inversion operations. In this paper we investigate the L2-norm regularized LSTD learning and propose an efficient algorithm to avoid expensive computational cost. We derive LSTD using Bellman operator along with projection operator. The L2-norm penalty is introduced to avoid over-fitting. We also describe the difference between Bellman residual minimization and LSTD. Then we propose an efficient recursive least-squares algorithm for L2-norm regularized LSTD, which can eliminate matrix inversion operations and decrease computational complexity effectively. We present empirical comparisons on the Boyan chain problem. The results show that the performance of the new algorithm is better than that of regularized LSTD.

机译：在强化学习中，当样本在某些实际应用中受到限制时，最小二乘时差（LSTD）学习容易过度拟合，可以通过引入规则化来克服。但是，具有正则化的LSTD解决方案仍然取决于昂贵的矩阵求逆运算。在本文中，我们研究了L2范数正则化LSTD学习，并提出了一种有效的算法来避免昂贵的计算成本。我们使用Bellman运算符和投影运算符得出LSTD。引入了L2范数惩罚以避免过度拟合。我们还描述了Bellman残差最小化和LSTD之间的差异。然后，针对L2-范数正则化LSTD提出了一种有效的递归最小二乘算法，该算法可以消除矩阵求逆运算，并有效降低计算复杂度。我们对博扬链问题进行实证比较。结果表明，新算法的性能优于正规化的LSTD。

著录项

来源
《Knowledge-Based Systems》 |2013年第6期|94-99|共6页
作者
Shenglei Chen; Geng Chen; Ruijun Gu;
展开▼
作者单位

College of Information Science, Nanjing Audit University, China;

College of Information Science, Nanjing Audit University, China;

College of Information Science, Nanjing Audit University, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
reinforcement learning; temporal difference; recursive least-squares; bellman residual minimizations; regularization;

机译：强化学习;时间差异;递归最小二乘;行李员残差最小化;正则化;

相似文献

外文文献
中文文献
专利

1. Kernel Recursive Least-Squares Temporal Difference Algorithms with Sparsification and Regularization [J] . Zhang Chunyuan, Zhu Qingxin, Niu Xinzheng Computational intelligence and neuroscience . 2016,第Pta3期

机译：内核递归最小二乘时间差分算法，具有稀疏和正规化
2. Sobolev Norm Learning Rates for Regularized Least-Squares Algorithms [J] . Simon Fischer, Ingo Steinwart Journal of machine learning research . 2020,第a期

机译：SOBOLEV规范最小二乘算法的学习速率
3. Efficient regularized least-squares algorithms for conditional ranking on relational data [J] . Tapio Pahikkala, Antti Airola, Michiel Stock, Machine Learning . 2013,第2a3期

机译：用于关系数据的条件排序的高效正则最小二乘算法
4. Least-Squares Temporal Difference Learning with Eligibility Traces based on Regularized Extreme Learning Machine [C] . Dazi Li, Luntong Li, Tianheng Song, Chinese Control and Decision Conference . 2016

机译：基于正规化的极端学习机的资格迹线，最小二乘时间差异学习
5. Incremental least-squares temporal difference learning. [D] . Geramifard, Alborz. 2007

机译：增量最小二乘时差学习。
6. Kernel Recursive Least-Squares Temporal Difference Algorithms with Sparsification and Regularization [O] . Chunyuan Zhang, Qingxin Zhu, Xinzheng Niu 2016

机译：具有稀疏化和正则化的内核递归最小二乘时间差分算法
7. Regularization and feature selection in least-squares temporal difference learning [O] . J. Zico Kolter, Andrew Y. Ng 2009

机译：最小二乘时差学习中的正则化和特征选择

An efficient L2-norm regularized least-squares temporal difference learning algorithm

摘要

著录项

相似文献

相关主题

期刊订阅