Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

机译：可怕的演员 - 评论家的全球融合：一种具有ergodic成本的线性二次调节器的案例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.

机译：尽管演员批评算法的经验成功，但其理论上的理解落后。在更广泛的背景下，演员 - 评论家可以被视为彼此的彼此优化的在线交替更新算法，其融合是脆弱的。要了解演员 - 评论家的不稳定，我们将专注于其在线性二次监管机构的应用，这是一种简单但基本的加固学习。我们在这个环境中建立了演员 - 评论家的令人反感融合分析。特别是，我们证明了演员 - 评论家在线性收敛速率找到全球最佳的演员（政策）和评论家（动作 - 价值函数）。我们的分析可以作为对具有非凸起子问题的彼得优化的完全理论理解的初步步骤，这在最坏的情况下是NP - 硬，并且通常使用启发式解决。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p7960-8760|共13页
会议地点
作者
Zhuoran Yang; Yongxin Chen; Mingyi Hong; Zhaoran Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator [J] . Maryam Fazel, Rong Ge, Sham Kakade, JMLR: Workshop and Conference Proceedings . 2018,第2011期

机译：线性二次调节器的政策梯度方法的全局融合
2. LINEAR-QUADRATIC N-PERSON AND MEAN-FIELD GAMES WITH ERGODIC COST [J] . Bardi Martino, Priuli Fabio S. SIAM Journal on Control and Optimization . 2014,第5期

机译：具有线性成本的线性二次N型人员和中场游戏
3. Stochastic linear quadratic regulators with indefinite control weight costs. II [J] . Chen SP., Zhou XY. SIAM Journal on Control and Optimization . 2000,第4期

机译：控制重量成本不确定的随机线性二次调节器。 II
4. Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost [C] . Zhuoran Yang, Yongxin Chen, Mingyi Hong, Conference on Neural Information Processing Systems . 2020

机译：可怕的演员 - 评论家的全球融合：一种具有ergodic成本的线性二次调节器的案例
5. A globally convergent quadratic penalty function method with fast local convergence properties. [D] . Hempel, Christian George. 1990

机译：具有快速局部收敛性的全局收敛二次罚函数方法。
6. Quadratic convergence of monotone iterates for semilinear elliptic obstacle problems [O] . Jinping Zeng, Haowen Chen, Hongru Xu -1

机译：半线性椭圆障碍问题单调迭代的二次收敛
7. Linear-Quadratic N-person and Mean-Field Games with Ergodic Cost [O] . M. Bardi, F.S. Priuli 2014

机译：具有遍历成本的线性二次N人和均值场博弈
8. Optimal Discrete-Time LQR (Linear Quadratic Regulator) Problems for Parabolic Systems with Unbounded Input: Approximation and Convergence [R] . Rosen, I. G. 1988

机译：具有无界输入的抛物方程组的最优离散时间LQR（线性二次型调节器）问题：逼近和收敛

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

摘要

著录项

相似文献

相关主题

期刊订阅