首页> 外文期刊>Neurocomputing >Bias-reduced hindsight experience replay with virtual goal prioritization
【24h】

Bias-reduced hindsight experience replay with virtual goal prioritization

机译:偏见减少的后敏感体验重放虚拟目标优先级

获取原文
获取原文并翻译 | 示例

摘要

Hindsight Experience Replay (HER) is a multi-goal reinforcement learning algorithm for sparse reward functions. The algorithm treats every failure as a success for an alternative (virtual) goal that has been achieved in the episode. Virtual goals are randomly selected, irrespective of which are most instructive for the agent. In this paper, we present two improvements over the existing HER algorithm. First, we pri-oritize virtual goals from which the agent will learn more valuable information. We call this property the instructiveness of the virtual goal and define it by a heuristic measure, which expresses how well the agent will be able to generalize from that virtual goal to actual goals. Secondly, we reduce existing bias in HER by the removal of misleading samples. To test our algorithms, we built three challenging environ-ments with sparse reward functions. Our empirical results in both environments show vast improvement in the final success rate and sample efficiency when compared to the original HER algorithm. A video showing experimental results is available at https://youtu.be/xjAiwJiSeLc. CO 2021 Published by Elsevier B.V.
机译:后敏感经验重播(她)是一种用于稀疏奖励功能的多目标加强学习算法。该算法将每次失败视为在集中实现的替代(虚拟)目标的成功。无论哪些对代理人最有意义的,无论哪些是最有意义的,都会选择虚拟目标。在本文中,我们对现有的她的算法提出了两种改进。首先,我们预测了代理商将学习更有价值的信息的虚拟目标。我们将此属性称为虚拟目标的说明并通过启发式测量来定义它,这表达了代理商能够将该虚拟目标概括为实际目标。其次,我们通过去除误导样本来减少她的现有偏见。要测试我们的算法,我们建立了三个具有稀疏奖励功能的具有挑战性的环境。与原始算法相比,我们在两种环境中的经验结果显示出最终成功率和样品效率的巨大改善。显示实验结果的视频可在Https://youtu.be/xjaiwjiselc上获得。 CO 2021由Elsevier B.V发布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号