Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

Bo Liu; Ian Gemp; Mohammad Ghavamzadeh; Ji Liu; Sridhar Mahadevan; Marek Petrik

首页> 外文期刊>The Journal of Artificial Intelligence Research >Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

【24h】

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

机译：渐近时间差异学习：具有多项式样本复杂度的稳定强化学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal "mirror maps" to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.

机译：在本文中，我们介绍了近端梯度时差学习，它为设计和分析真实的随机梯度时差学习算法提供了一种原理方法。我们展示了梯度TD（GTD）强化学习方法是如何正式得出的，而不是像先前尝试的那样从其原始目标函数开始，而是从原始-双鞍点目标函数开始。我们还进行了鞍点误差分析，以获取有关其性能的有限样本范围。此类算法的先前分析使用随机逼近技术来证明渐近收敛，并且不提供任何有限样本分析。我们还提出了一种称为GTD2-MP的加速算法，该算法使用近端“镜像图”来提高收敛速度。我们理论分析的结果表明，GTD系列算法具有可比性，并且由于其线性复杂性，实际上可能比现有的最小二乘TD方法更适合用于非策略学习。我们提供的实验结果显示了我们的加速梯度TD方法的改进性能。

著录项

来源
《The Journal of Artificial Intelligence Research》 |2018年第8期|共34页
作者
Bo Liu; Ian Gemp; Mohammad Ghavamzadeh; Ji Liu; Sridhar Mahadevan; Marek Petrik;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity [J] . Liu Bo, Gemp Ian, Ghavamzadeh Mohammad, The Journal of Artificial Intelligence Research . 2018,第期

机译：近端梯度时间差异学习：具有多项式样本复杂性的稳定增强学习
2. Hippocampal replay contributes to within session learning in a temporal difference reinforcement learning model. [J] . Johnson A, Redish AD Neural Networks: The Official Journal of the International Neural Network Society . 2005,第9期

机译：海马重放在时间差异强化学习模型中有助于会话内学习。
3. Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning [J] . Cichosz P. The Journal of Artificial Intelligence Research . 1995,第12期

机译：截断时间差异：关于强化学习的TD（lambda）的有效实施
4. Proximal Gradient Temporal Difference Learning Algorithms [C] . Bo Liu, Ji Liu, Mohammad Ghavamzadeh, International Joint Conference on Artificial Intelligence . 2016

机译：近端梯度时间差异学习算法
5. The Sample Complexity of Simple Reinforcement Learning [D] . Mania, Horia S. 2020

机译：简单加强学习的样本复杂性
6. Stable reinforcement learning via temporal competition between LTP and LTD traces [O] . Marco A Huertas, Sarah Schwettmann, Alfredo Kirkwood, 2014

机译：通过LTP和LTD迹线之间的时间竞争来稳定地进行强化学习
7. Diffusion Gradient Temporal Difference for Cooperative Reinforcement Learning with Linear Function Approximation [O] . Valcarcel Macua Sergio, Belanovic Pavle, Zazo Bello Santiago 2012

机译：线性函数近似的合作强化学习的扩散梯度时间差异

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

摘要

著录项

相似文献

相关主题

期刊订阅