Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

Yue Wang; Shaofeng Zou

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

【24h】

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

机译：马尔科夫噪声下线性函数近似的贪婪-GQ有限样本分析

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning. This paper develops the first finite-sample analysis for the Greedy-GQ algorithm with linear function approximation under Markovian noise. Our finite-sample analysis provides theoretical justification for choosing stepsizes for this two timescale algorithm for faster convergence in practice, and suggests a trade-off between the convergence rate and the quality of the obtained policy. Our paper extends the finite-sample analyses of two timescale reinforcement learning algorithms from policy evaluation to optimal control, which is of more practical interest. Specifically, in contrast to existing finite-sample analyses for two timescale methods, e.g., GTD, GTD2 and TDC, where their objective functions are convex, the objective function of the Greedy-GQ algorithm is non-convex. Moreover, the Greedy-GQ algorithm is also not a linear two-timescale stochastic approximation algorithm. Our techniques in this paper provide a general framework for finite-sample analysis of non-convex value-based reinforcement learning algorithms for optimal control.

机译：贪婪-GQ是一种脱策措施，用于加固学习中的最优控制算法。本文开发了贪婪-GQ算法的第一个有限样本分析，在马尔科夫噪声下线性函数近似。我们的有限样本分析提供了理论上的理由，用于选择这两个时间段算法的步骤，以便在实践中更快地收敛，并在收敛率和所获得的政策的质量之间提出权衡。我们的论文扩展了两个时间尺度加固学习算法的有限样本分析，从政策评估到最优控制，这具有更实际的兴趣。具体地，与两个时间尺度方法的现有有限样本分析相比，例如，GTD，GTD2和TDC，其目标函数凸出，贪婪-GQ算法的目标函数是非凸的。此外，贪婪-GQ算法也不是线性的两时间尺度随机近似算法。本文的技术为基于非凸值的增强学习算法进行了有限样本分析，提供了一种用于最佳控制的有限样本分析的一般框架。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共10页
作者
Yue Wang; Shaofeng Zou;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Finite Time Analysis of Linear Two-timescale Stochastic Approximation with Markovian Noise [J] . Maxim Kaledin, Eric Moulines, Alexey Naumov, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：与马尔可夫噪声线性两次时间尺度随机近似的有限时间分析
2. The application of different Lyapunov-like functionals and some aggregate norm approximations of the delayed states for finite-time stability analysis of linear discrete time-delay systems [J] . Sreten B. Stojanovic, Dragutin Lj. Debeljkovic, Dragan S. Antic Journal of the Franklin Institute . 2014,第7期

机译：线性离散离散时滞系统有限时稳定性分析中不同类Lyapunov函数的应用以及一些延迟状态的集合范数逼近
3. Stochastic analysis of Chemical Reaction Networks using Linear Noise Approximation [J] . Cardelli Luca, Kwiatkowska Marta, Laurenti Luca BioSystems . 2016,第Null期

机译：基于线性噪声近似的化学反应网络的随机分析
4. Finite-Sample Analysis for SARSA with Linear Function Approximation [C] . Shaofeng Zou, Tengyu Xu, Yingbin Liang Conference on Neural Information Processing Systems . 2020

机译：具有线性函数近似的Sarsa的有限样本分析
5. Convex Hulls, Relaxations, and Approximations of General Monomials and Multilinear Functions [D] . Xu, Yibo. 2018

机译：一般单项式和多重线性函数的凸包，松弛和逼近
6. Experimental Design for Stochastic Models of Nonlinear Signaling Pathways Using an Interval-Wise Linear Noise Approximation and State Estimation [O] . Christoph Zimmer -1

机译：区间线性噪声近似和状态估计的非线性信号通路随机模型的实验设计
7. NOISE ROBUST SPEECH RECOGNITION USING GAUSSIAN BASIS FUNCTIONS FOR NON-LINEAR LIKELIHOOD FUNCTION APPROXIMATION [O] . Chris Pal, Brendan Frey, Trausti Kristjansson 2008

机译：利用高斯基函数对非线性似然函数近似进行噪声鲁棒语音识别
8. Diffusion Approximations for the Analysis of Digital Phase Locked Loops. II. Diffusion Approximations for Nonlinear Phase Loop-Type Systems with Wide Band Inputs [R] . Kushner, H. J., Huang, H., Ju, W. T. Y. 1980

机译：数字锁相环分析的扩散近似。 II。具有宽带输入的非线性相环系统的扩散近似

Finite-sample Analysis of Greedy-GQ with Linear Function Approximation under Markovian Noise

摘要

著录项

相似文献

相关主题

期刊订阅