首页> 外文会议>International Joint Conference on Artificial Intelligence >Proximal Gradient Temporal Difference Learning Algorithms
【24h】

Proximal Gradient Temporal Difference Learning Algorithms

机译:近端梯度时间差异学习算法

获取原文

摘要

In this paper, we describe proximal gradient temporal difference learning, which provides a principled way for designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD) reinforcement learning methods can be formally derived, not with respect to their original objective functions as previously attempted, but rather with respect to primal-dual saddle-point objective functions. We also conduct a saddle-point error analysis to obtain finite-sample bounds on their performance. Previous analyses of this class of algorithms use stochastic approximation techniques to prove asymptotic convergence, and no finite-sample analysis had been attempted. An accelerated algorithm is also proposed, namely GTD2-MP, which use proximal "mirror maps" to yield acceleration. The results of our theoretical analysis imply that the GTD family of algorithms are comparable and may indeed be preferred over existing least squares TD methods for off-policy learning, due to their linear complexity. We provide experimental results showing the improved performance of our accelerated gradient TD methods.
机译:在本文中,我们描述了近端梯度时间差异学习,它提供了一种用于设计和分析真正的随机梯度时间差学习算法的原则方法。我们展示了如何正式导出梯度TD(GTD)加强学习方法,而不是先前尝试的原始客观函数,而是相对于原始 - 双鞍点目标函数。我们还开展了马鞍点错误分析,以获得其性能的有限样本界限。此类算法的先前分析使用随机近似技术来证明渐近收敛,并且没有尝试有限样本分析。还提出了一种加速算法,即GTD2-MP,它使用近端“镜像图”来产生加速度。我们理论分析的结果意味着GTD算法的算法是可比的,并且由于它们的线性复杂性而言,可以对禁止策略学习的现有最小二乘TD方法进行比较。我们提供了实验结果,显示了我们加速梯度TD方法的改进性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号