首页> 外文期刊>The Journal of Artificial Intelligence Research >Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity
【24h】

Proximal Gradient Temporal Difference Learning: Stable Reinforcement Learning with Polynomial Sample Complexity

机译:渐近时间差异学习:具有多项式样本复杂度的稳定强化学习

获取原文
           

摘要

In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not by starting from their original objective functions, as previously attempted, but rather from a primal-dual saddle-point objective function. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and do not provide any finite-sample analysis. We also propose an accelerated algorithm, called GTD2-MP, that uses proximal "mirror maps" to yield an improved convergence rate. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.
机译:在本文中,我们介绍了近端梯度时差学习,它为设计和分析真实的随机梯度时差学习算法提供了一种原理方法。我们展示了梯度TD(GTD)强化学习方法是如何正式得出的,而不是像先前尝试的那样从其原始目标函数开始,而是从原始-双鞍点目标函数开始。我们还进行了鞍点误差分析,以获取有关其性能的有限样本范围。此类算法的先前分析使用随机逼近技术来证明渐近收敛,并且不提供任何有限样本分析。我们还提出了一种称为GTD2-MP的加速算法,该算法使用近端“镜像图”来提高收敛速度。我们理论分析的结果表明,GTD系列算法具有可比性,并且由于其线性复杂性,实际上可能比现有的最小二乘TD方法更适合用于非策略学习。我们提供的实验结果显示了我们的加速梯度TD方法的改进性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号