...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise
【24h】

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

机译:马尔科夫噪声下线性函数近似的贪婪-GQ有限样本分析

获取原文
           

摘要

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.
机译:贪婪-GQ是一种脱策措施,用于加固学习中的最优控制算法。本文开发了贪婪-GQ算法的第一个有限样本分析,在马尔科夫噪声下线性函数近似。我们的有限样本分析提供了理论上的理由,用于选择这两个时间段算法的步骤,以便在实践中更快地收敛,并在收敛率和所获得的政策的质量之间提出权衡。我们的论文扩展了两个时间尺度加固学习算法的有限样本分析,从政策评估到最优控制,这具有更实际的兴趣。具体地,与两个时间尺度方法的现有有限样本分析相比,例如,GTD,GTD2和TDC,其目标函数凸出,贪婪-GQ算法的目标函数是非凸的。此外,贪婪-GQ算法也不是线性的两时间尺度随机近似算法。本文的技术为基于非凸值的增强学习算法进行了有限样本分析,提供了一种用于最佳控制的有限样本分析的一般框架。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号