GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

Cédric Colas; Olivier Sigaud; Pierre-Yves Oudeyer

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

【24h】

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

机译：GEP-PG：深增强学习算法解耦探索与开发

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like Novelty Search, Quality-Diversity or Goal Exploration Processes explore more robustly but are less efficient at fine-tuning policies using gradient-descent. In this paper, we present the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG . We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark. We show that DDPG fails on the former and that GEP-PG improves over the best DDPG variant in both environments.

机译：在连续动作域中，当面对稀疏或欺骗性奖励问题时，DDPG等标准的深度增强学习算法遭受了低效的探索。相反，对探索的进化和发展方法，如新颖性搜索，质量 - 多样性或目标探索过程更加强劲，但在使用梯度下降的微调政策下效率较低。在本文中，我们介绍了GEP-PG方法，通过顺序结合目标探索过程和DDPG的两个变体来获得两个世界。我们研究这些组件的学习性能及其在低维欺骗奖励问题和较大的半猎豹基准上的组合。我们表明DDPG在前者上失败，并且GEP-PG在这两种环境中的最佳DDPG变体上提高了。

著录项

来源
《JMLR: Workshop and Conference Proceedings 》 |2018年第2010期| 共10页
作者
Cédric Colas; Olivier Sigaud; Pierre-Yves Oudeyer;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning [J] . Damien Ernst, Francis Maes, Michael Castronovo, JMLR: Workshop and Conference Proceedings . 2012 ,第2012期

机译：单轨强化学习的学习探索/开发策略
2. Deep Reinforcement Learning-Based Optimal Decoupling Capacitor Design Method for Silicon Interposer-Based 2.5-D/3-D ICs [J] . Park Hyunwook, Kim Seongguk, Kim Youngwoo, Components, Packaging and Manufacturing Technology, IEEE Transactions on . 2020 ,第3期

机译：基于深度加强学习的基于硅中间器的最佳解耦电容器设计方法2.5-D / 3-D ICS
3. A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation [J] . Li Han, Chen Tianding, Teng Hualiang, Computer Modeling in Engineering & Sciences . 2019 ,第2期

机译：一种基于图的加强学习方法，具有融合状态探索和剥削
4. An Enhanced Deep Reinforcement Learning Algorithm for Decoupling Capacitor Selection in Power Distribution Network Design [C] . Ling Zhang, Wenchang Huang, Jack Juang, IEEE International Symposium on Electromagnetic Compatibility Signal/Power Integrity . 2020

机译：配电网设计中去耦电容选择的增强型深度强化学习算法
5. Solving the Binary Knapsack Problem Using Tabular and Deep Reinforcement Learning Algorithms [D] . Benford, Samuel Levente. 2021

机译：使用表格和深加固学习算法解决二元背包问题
6. Exploit fully automatic low-level segmented PET data for training high-level deep learning algorithms for the corresponding CT data [O] . Christina Gsaxner, Peter M. Roth, Jürgen Wallner, -1

机译：利用全自动的低级分段PET数据来训练相应CT数据的高级深度学习算法
7. Meta-learning of exploration-exploitation strategies in reinforcement learning [O] . Ernst, Damien 2013

机译：强化学习中探索与开发策略的元学习

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

摘要

著录项

相似文献

相关主题

期刊订阅