A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

Nhan Pham; Lam Nguyen; Dzung Phan; PHUONG HA NGUYEN; Marten Dijk; Quoc Tran-Dinh

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

【24h】

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

机译：一种钢筋学习的混合随机政策梯度算法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a novel hybrid stochastic policy gradient estimator by combining an unbiased policy gradient estimator, the REINFORCE estimator, with another biased one, an adapted SARAH estimator for policy optimization. The hybrid policy gradient estimator is shown to be biased, but has variance reduced property. Using this estimator, we develop a new Proximal Hybrid Stochastic Policy Gradient Algorithm (ProxHSPGA) to solve a composite policy optimization problem that allows us to handle constraints or regularizers on the policy parameters. We first propose a single-looped algorithm then introduce a more practical restarting variant. We prove that both algorithms can achieve the best-known trajectory complexity to attain a first-order stationary point for the composite problem which is better than existing REINFORCE/GPOMDP and SVRPG in the non-composite setting. We evaluate the performance of our algorithm on several well-known examples in reinforcement learning. Numerical results show that our algorithm outperforms two existing methods on these examples. Moreover, the composite settings indeed have some advantages compared to the non-composite ones on certain problems.

机译：我们通过组合无偏策略梯度估计器，加强估计器，另一个偏置的一个改进的SARAH估计器来提出一种新的混合随机政策梯度估计器，适用于政策优化。混合策略梯度估计器显示被偏置，但具有方差减少了属性。使用此估算器，我们开发了一种新的近端混合随机策略梯度算法（Proxhspga），以解决复合策略优化问题，允许我们处理策略参数上的约束或常规程序。我们首先提出单环路算法，然后引入更实用的重启变体。我们证明这两种算法都可以实现最着名的轨迹复杂性以获得用于复合问题的一阶静止点，这比在非复合装置中的现有增强/ GPOMDP和SVRPG更好。我们评估我们算法在钢筋学习中的几个知名例子上的性能。数值结果表明，我们的算法优于这些示例的两种现有方法。此外，与某些问题的非复合材料相比，复合设置确实具有一些优点。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2020年第2010期|共12页
作者
Nhan Pham; Lam Nguyen; Dzung Phan; PHUONG HA NGUYEN; Marten Dijk; Quoc Tran-Dinh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Langevin Dynamics for Adaptive Inverse Reinforcement Learning of Stochastic Gradient Algorithms [J] . Vikram Krishnamurthy, George Yin Journal of machine learning research . 2021,第a期

机译：随机梯度算法自适应逆加固学习的Langevin动态
2. An Algorithm of Policy Gradient Reinforcement Learning with a Fuzzy Controller in Policies [J] . Harukazu Igarashi, Seiji Ishihara International Journal of Artificial Intelligence and Expert Systems (IJAE) . 2013,第1期

机译：策略中带有模糊控制器的策略梯度强化学习算法
3. Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution [J] . Po-Wei Chou, Daniel Maturana, Sebastian Scherer JMLR: Workshop and Conference Proceedings . 2017,第1期

机译：使用Beta分布通过深度强化学习提高连续控制中的随机策略梯度
4. Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method [C] . Sébastien Gros, Mario Zanon Annual American Control Conference . 2021

机译：基于MPC的加固学习与随机政策梯度法
5. On the convergence of model -free policy iteration algorithms for reinforcement learning: Stochastic approximation under discontinuous mean dynamics. [D] . Williams, John Kevin. 2000

机译：关于用于增强学习的无模型策略迭代算法的收敛：不连续平均动力学下的随机逼近。
6. Correction: Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail [O] . Eleni Vasilaki, Nicolas Frémaux, Robert Urbanczik, 2009

机译：更正：在连续状态和动作空间中基于峰值的强化学习：当策略梯度方法失败时
7. Stochastic policy gradient reinforcement learning on a simple 3D biped [O] . Russ Tedrake 2004

机译：基于简单3D两足动物的随机策略梯度强化学习

A Hybrid Stochastic Policy Gradient Algorithm for Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅