首页> 外文会议>IEEE International Conference on Robot and Human Interactive Communication >Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning
【24h】

Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning

机译:基于目标密度的后目标体验优先级,用于多目标机器人操纵强化学习

获取原文

摘要

Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.
机译:多目标机器人操纵任务的强化学习通常具有挑战性,特别是在提供稀疏奖励的情况下。在学习稳定的策略之前,通常需要收集数百万的数据。诸如Hindsight Experience Replay(HER)之类的最新算法通过将达到的目标之一(替代目标)替换为相同的轨迹,从而极大地加快了学习过程。然而,先前学习经验的选择是在HER中天真地进行采样的,其中轨迹选择和替代目标采样是完全随机的。在本文中,我们讨论了一种用于HER的体验优先级策略,该策略可以提高学习效率。我们提出了一种基于目标密度的事后观察优先级(GDP)方法,该方法侧重于利用已实现点的密度分布,并对已实现点进行优先级处理,而这些点在重放缓冲区中很少见。这些点用作HER的替代目标。此外,我们提出了一种“优先综合排序交换”(PSES)方法,以在学习过程中切换不同的体验优先排序算法,从而可以在每个学习阶段选择最佳性能。我们通过几个OpenAI Gym机器人操纵任务来评估我们的方法。结果表明,GDP可以在大多数任务中加速学习过程,并且与其他使用PSES的优先排序方法结合使用时,GDP可以得到改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号