Proximal Gradient Temporal Difference Learning Algorithms

机译：近端梯度时间差异学习算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we describe proximal gradient temporal difference learning, which provides a principled way for designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not with respect to their original objective functions as previously attempted, but rather with respect to primal-dual saddle-point objective functions. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and no finite-sample analysis had been attempted. An accelerated algorithm is also proposed, namely GTD2-MP, which use proximal "mirror maps" to yield acceleration. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

机译：在本文中，我们描述了近端梯度时间差异学习，它提供了一种用于设计和分析真正的随机梯度时间差学习算法的原则方法。我们展示了如何正式导出梯度TD（GTD）加强学习方法，而不是先前尝试的原始客观函数，而是相对于原始 - 双鞍点目标函数。我们还开展了马鞍点错误分析，以获得其性能的有限样本界限。此类算法的先前分析使用随机近似技术来证明渐近收敛，并且没有尝试有限样本分析。还提出了一种加速算法，即GTD2-MP，它使用近端“镜像图”来产生加速度。我们理论分析的结果意味着GTD算法的算法是可比的，并且由于它们的线性复杂性而言，可以对禁止策略学习的现有最小二乘TD方法进行比较。我们提供了实验结果，显示了我们加速梯度TD方法的改进性能。

著录项

来源
《International Joint Conference on Artificial Intelligence》|2016年|3560-4297p|共5页
会议地点
作者
Bo Liu; Ji Liu; Mohammad Ghavamzadeh; Sridhar Mahadevan; Marek Petrik;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity [J] . Bo Liu, Ian Gemp, Mohammad Ghavamzadeh, The Journal of Artificial Intelligence Research . 2018,第8期

机译：渐近时间差异学习：具有多项式样本复杂度的稳定强化学习
2. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity [J] . Liu Bo, Gemp Ian, Ghavamzadeh Mohammad, The Journal of Artificial Intelligence Research . 2018,第期

机译：近端梯度时间差异学习：具有多项式样本复杂性的稳定增强学习
3. Implementing Temporal-Difference Learning with the Scaled Conjugate Gradient Algorithm [J] . TASOS FALAS, ANDREAS STAFYLOPATIS Neural processing letters . 2005,第3期

机译：用比例共轭梯度算法实现时差学习
4. Proximal Gradient Temporal Difference Learning Algorithms [C] . Bo Liu, Ji Liu, Mohammad Ghavamzadeh, International Joint Conference on Artificial Intelligence . 2016

机译：近端梯度时间差异学习算法
5. Stochastic Gradient Descent for Modern Machine Learning: Theory, Algorithms and Applications [D] . Kidambi, Rahul. 2019

机译：现代机器学习的随机梯度下降：理论，算法和应用
6. Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task? [O] . Jean Bellot, Mehdi Khamassi, Olivier Sigaud, 2013

机译：哪种时间差异学习算法最能重现多项选择任务中的多巴胺活动？
7. GQ( ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces [O] . Hamid Reza Maei, Richard S. Sutton 2010

机译：GQ（）：资格迹线的时间差预测学习一般梯度算法

Proximal Gradient Temporal Difference Learning Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅