Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning

机译：基于目标密度的后目标体验优先级，用于多目标机器人操纵强化学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning for multi-goal robot manipulation tasks is usually challenging, especially when sparse rewards are provided. It often requires millions of data collected before a stable strategy is learned. Recent algorithms like Hindsight Experience Replay (HER) have accelerated the learning process greatly by replacing the original desired goal with one of the achieved points (substitute goals) alongside the same trajectory. However, the selection of previous experience to learn is naively sampled in HER, in which the trajectory selection and the substitute goal sampling is completely random. In this paper, we discuss an experience prioritization strategy for HER that improves the learning efficiency. We propose the Goal Density-based hindsight experience Prioritization (GDP) method that focuses on utilizing the density distribution of the achieved points and prioritizes achieved points which are rarely seen in the replay buffer. These points are used as substitute goals for HER. In addition, we propose an Prioritization Switching with Ensembling Strategy (PSES) method to switch different experience prioritization algorithms during learning, which allows to select the best performance during each learning stage. We evaluate our method with several OpenAI Gym robotic manipulation tasks. The results show that GDP accelerates the learning process in most tasks and can be improved when combining with other prioritization methods using PSES.

机译：多目标机器人操纵任务的强化学习通常具有挑战性，特别是在提供稀疏奖励的情况下。在学习稳定的策略之前，通常需要收集数百万的数据。诸如Hindsight Experience Replay（HER）之类的最新算法通过将达到的目标之一（替代目标）替换为相同的轨迹，从而极大地加快了学习过程。然而，先前学习经验的选择是在HER中天真地进行采样的，其中轨迹选择和替代目标采样是完全随机的。在本文中，我们讨论了一种用于HER的体验优先级策略，该策略可以提高学习效率。我们提出了一种基于目标密度的事后观察优先级（GDP）方法，该方法侧重于利用已实现点的密度分布，并对已实现点进行优先级处理，而这些点在重放缓冲区中很少见。这些点用作HER的替代目标。此外，我们提出了一种“优先综合排序交换”（PSES）方法，以在学习过程中切换不同的体验优先排序算法，从而可以在每个学习阶段选择最佳性能。我们通过几个OpenAI Gym机器人操纵任务来评估我们的方法。结果表明，GDP可以在大多数任务中加速学习过程，并且与其他使用PSES的优先排序方法结合使用时，GDP可以得到改善。

著录项

来源
《IEEE International Conference on Robot and Human Interactive Communication》|2020年|432-437|共6页
会议地点
作者
Yingyi Kuang; Abraham Itzhak Weinberg; George Vogiatzis; Diego R. Faria;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Guided goal generation for hindsight multi-goal reinforcement learning [J] . Bai Chenjia, Liu Peng, Zhao Wei, Neurocomputing . 2019,第SEPa24期

机译：指导目标生成，用于事后洞察多目标强化学习
2. Guided goal generation for hindsight multi-goal reinforcement learning [J] . Bai Chenjia, Liu Peng, Zhao Wei, Neurocomputing . 2019,第Sepa24期

机译：后敏捷多目标强化学习的导游目标
3. Generating attentive goals for prioritized hindsight reinforcement learning [J] . Liu Peng, Bai Chenjia, Zhao Yingnan, Knowledge-Based Systems . 2020,第Sepa5期

机译：为优先考虑强化学习产生专注目标
4. Continuous Value Iteration (CVI) Reinforcement Learning and Imaginary Experience Replay (IER) For Learning Multi-Goal, Continuous Action and State Space Controllers [C] . Andreas Gerken, Michael Spranger International Conference on Robotics and Automation . 2019

机译：用于学习多目标，连续动作和状态空间控制器的连续值迭代（CVI）强化学习和虚幻体验重放（IER）
5. Multi-Goal Path Optimization for Robotic Systems with Redundancy based on the Traveling Salesman Problem with Neighborhoods. [D] . Gentilini, Iacopo. 2012

机译：基于带邻域的旅行商问题的冗余机器人系统多目标路径优化。
6. Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor–Critic with Hindsight Experience Replay [O] . Evan Prianto, MyeongSeop Kim, Jae-Han Park, 2020

机译：使用深度加强学习的多臂操纵器的路径规划：软演员 - 与后敏感体验重播
7. Complex Robotic Manipulation via Graph-Based Hindsight Goal Generation [O] . Zhenshan Bing, Matthias Brucker, Fabrice O. Morin, 2021

机译：通过基于图形的后敏感的目标生成复杂的机器人操纵

Goal Density-based Hindsight Experience Prioritization for Multi-Goal Robot Manipulation Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅