...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms
【24h】

GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms

机译:GEP-PG:深增强学习算法解耦探索与开发

获取原文

摘要

In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like Novelty Search, Quality-Diversity or Goal Exploration Processes explore more robustly but are less efficient at fine-tuning policies using gradient-descent. In this paper, we present the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG . We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark. We show that DDPG fails on the former and that GEP-PG improves over the best DDPG variant in both environments.
机译:在连续动作域中,当面对稀疏或欺骗性奖励问题时,DDPG等标准的深度增强学习算法遭受了低效的探索。相反,对探索的进化和发展方法,如新颖性搜索,质量 - 多样性或目标探索过程更加强劲,但在使用梯度下降的微调政策下效率较低。在本文中,我们介绍了GEP-PG方法,通过顺序结合目标探索过程和DDPG的两个变体来获得两个世界。我们研究这些组件的学习性能及其在低维欺骗奖励问题和较大的半猎豹基准上的组合。我们表明DDPG在前者上失败,并且GEP-PG在这两种环境中的最佳DDPG变体上提高了。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号