首页> 外文会议>Conference on Neural Information Processing Systems >Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost
【24h】

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

机译:可怕的演员 - 评论家的全球融合:一种具有ergodic成本的线性二次调节器的案例

获取原文

摘要

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.
机译:尽管演员批评算法的经验成功,但其理论上的理解落后。 在更广泛的背景下,演员 - 评论家可以被视为彼此的彼此优化的在线交替更新算法,其融合是脆弱的。 要了解演员 - 评论家的不稳定,我们将专注于其在线性二次监管机构的应用,这是一种简单但基本的加固学习。 我们在这个环境中建立了演员 - 评论家的令人反感融合分析。 特别是,我们证明了演员 - 评论家在线性收敛速率找到全球最佳的演员(政策)和评论家(动作 - 价值函数)。 我们的分析可以作为对具有非凸起子问题的彼得优化的完全理论理解的初步步骤,这在最坏的情况下是NP - 硬,并且通常使用启发式解决。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号