Bias-reduced hindsight experience replay with virtual goal prioritization

Manela B.; Biess A.

首页> 外文期刊>Neurocomputing >Bias-reduced hindsight experience replay with virtual goal prioritization

【24h】

Bias-reduced hindsight experience replay with virtual goal prioritization

机译：偏见减少的后敏感体验重放虚拟目标优先级

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm for sparse reward functions. The algorithm treats every failure as a success for an alternative (virtual) goal that has been achieved in the episode. Virtual goals are randomly selected, irrespective of which are most instructive for the agent. In this paper, we present two improvements over the existing HER algorithm. First, we pri-oritize virtual goals from which the agent will learn more valuable information. We call this property the instructiveness of the virtual goal and define it by a heuristic measure, which expresses how well the agent will be able to generalize from that virtual goal to actual goals. Secondly, we reduce existing bias in HER by the removal of misleading samples. To test our algorithms, we built three challenging environ-ments with sparse reward functions. Our empirical results in both environments show vast improvement in the final success rate and sample efficiency when compared to the original HER algorithm. A video showing experimental results is available at https://youtu.be/xjAiwJiSeLc. CO 2021 Published by Elsevier B.V.

机译：后敏感经验重播（她）是一种用于稀疏奖励功能的多目标加强学习算法。该算法将每次失败视为在集中实现的替代（虚拟）目标的成功。无论哪些对代理人最有意义的，无论哪些是最有意义的，都会选择虚拟目标。在本文中，我们对现有的她的算法提出了两种改进。首先，我们预测了代理商将学习更有价值的信息的虚拟目标。我们将此属性称为虚拟目标的说明并通过启发式测量来定义它，这表达了代理商能够将该虚拟目标概括为实际目标。其次，我们通过去除误导样本来减少她的现有偏见。要测试我们的算法，我们建立了三个具有稀疏奖励功能的具有挑战性的环境。与原始算法相比，我们在两种环境中的经验结果显示出最终成功率和样品效率的巨大改善。显示实验结果的视频可在Https://youtu.be/xjaiwjiselc上获得。 CO 2021由Elsevier B.V发布。

著录项

来源
《Neurocomputing》 |2021年第3期|305-315|共11页
作者
Manela B.; Biess A.;
展开▼
作者单位

Ben Gurion Univ Negev Dept Ind Engn & Management Beer Sheva Israel;

Ben Gurion Univ Negev Dept Ind Engn & Management Beer Sheva Israel;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-goal reinforcement learning; Hindsight Experience Replay; Sparse reward function; Virtual goals;

机译：多目标强化学习;后敏感经验重播;稀疏奖励功能;虚拟目标;

相似文献

外文文献
中文文献
专利

1. Continuous shared control in prosthetic hand grasp tasks by Deep Deterministic Policy Gradient with Hindsight Experience Replay [J] . Zhaolong Gao, Rongyu Tang, Luyao Chen, International Journal of Advanced Robotic Systems . 2020,第4期

机译：通过深度确定性政策梯度与后敏感体验重放的持续共享控制掌握任务
2. SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY [J] . Yan Tao, Zhang Wenan, Yang Simon X., International Journal of Robotics & Automation . 2019,第5期

机译：软电演位批评机器人机器人与后勤体验重播的批评
3. Generating attentive goals for prioritized hindsight reinforcement learning [J] . Liu Peng, Bai Chenjia, Zhao Yingnan, Knowledge-Based Systems . 2020,第Sepa5期

机译：为优先考虑强化学习产生专注目标
4. Hindsight-Combined and Hindsight-Prioritized Experience Replay [C] . Renzo Roel P. Tan, Kazushi Ikeda, John Paul C. Vergara International Conference on Neural Information Processing . 2020

机译：Hindsight - 合并和后视优先考虑重播
5. Checkpoint Hindsight Experience Replay, Intuitive Application of Domain Knowledge in Reward-sparse Environments [D] . Wyss, Eric K. 2020

机译：CheckPoint Hindsight体验重播，直观地在奖励稀疏环境中应用域知识
6. Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor–Critic with Hindsight Experience Replay [O] . Evan Prianto, MyeongSeop Kim, Jae-Han Park, 2020

机译：使用深度加强学习的多臂操纵器的路径规划：软演员 - 与后敏感体验重播
7. Hindsight Experience Replay Improves Reinforcement Learning for Control of a MIMO Musculoskeletal Model of the Human Arm [O] . Douglas C. Crowder, Jessica Abreu, Robert F. Kirsch 2021

机译：Hindsight体验重播改善了控制人类手臂MIMO肌肉骨骼模型的加固学习

Bias-reduced hindsight experience replay with virtual goal prioritization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅