...
首页> 外文期刊>Journal of machine learning research >On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift
【24h】

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

机译:关于政策梯度方法的理论:最优,近似和分布换档

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces. However, little is known about even their most basic theoretical convergence properties, including: if and how fast they converge to a globally optimal solution or how they cope with approximation error due to using a restricted class of parametric policies. This work provides provable characterizations of the computational, approximation, and sample size properties of policy gradient methods in the context of discounted Markov Decision Processes (MDPs). We focus on both: "tabular" policy parameterizations, where the optimal policy is contained in the class and where we show global convergence to the optimal policy; and parametric policy classes (considering both log-linear and neural policy classes), which may not contain the optimal policy and where we provide agnostic learning results. One central contribution of this work is in providing approximation guarantees that are average case --- which avoid explicit worst-case dependencies on the size of state space --- by making a formal connection to supervised learning under distribution shift. This characterization shows an important interplay between estimation error, approximation error, and exploration (as characterized through a precisely defined condition number).
机译:政策梯度方法是具有大状态和/或行动空间的强化学习问题的最有效方法之一。然而,对于即使是它们最基本的理论收敛性,也很少熟知,包括:如果以及如何快速收敛到全局最佳解决方案或它们如何应对由于使用受限制的参数策略而应对近似误差。此工作提供了在折扣Markov决策过程(MDP)的上下文中的策略梯度方法的计算,近似和样本大小属性的可提供表征。我们专注于:“表格”政策参数化,其中最佳政策包含在课堂上以及我们向最佳政策显示全球融合的地方;和参数策略类(考虑对数线性和神经策略类),可能不包含最佳政策以及我们提供不可知论的学习结果。这项工作的一个核心贡献是提供平均案例的近似保证 - 它避免了在状态空间大小的明确最坏情况依赖关系 - 通过在分布班次下进行监督学习的正式连接。此表征显示估计误差,近似误差和探索之间的重要相互作用(通过精确定义的条件号而表征)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号