首页> 外文会议>International Conference on Machine Learning >Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
【24h】

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

机译:最大熵 - 正规化的多目标强化学习

获取原文

摘要

In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.
机译:在多目标强化学习中,代理商学会通过目标条件的政策实现多个目标。在学习期间,代理首先将轨迹收集到重放缓冲区中,稍后会随机选择这些轨迹以进行重放。但是,重播缓冲区中的实现目标通常偏向行为策略。从贝叶斯角度来看,当没有关于目标目标分配的先验知识时,代理人应该从多元化的目标中统一学习。因此,我们首先提出了一种基于加权熵的新型多目标RL目标。这一目标鼓励代理人最大限度地提高预期回报,以及实现更多样化的目标。其次,我们开发了最大基于熵的优先级框架,以优化所提出的目标。为了评估本框架,我们将它与深度确定性政策梯度相结合,无论是有还是没有后勤体验重放。在一组Openai健身房的多目标机器人任务上,我们将我们的方法与其他基线进行比较,并显示出于性能和样品效率的有希望的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号