Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

机译：最大熵 - 正规化的多目标强化学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In Multi-Goal Reinforcement Learning, an agent learns to achieve multiple goals with a goal-conditioned policy. During learning, the agent first collects the trajectories into a replay buffer, and later these trajectories are selected randomly for replay. However, the achieved goals in the replay buffer are often biased towards the behavior policies. From a Bayesian perspective, when there is no prior knowledge about the target goal distribution, the agent should learn uniformly from diverse achieved goals. Therefore, we first propose a novel multi-goal RL objective based on weighted entropy. This objective encourages the agent to maximize the expected return, as well as to achieve more diverse goals. Secondly, we developed a maximum entropy-based prioritization framework to optimize the proposed objective. For evaluation of this framework, we combine it with Deep Deterministic Policy Gradient, both with or without Hindsight Experience Replay. On a set of multi-goal robotic tasks of OpenAI Gym, we compare our method with other baselines and show promising improvements in both performance and sample-efficiency.

机译：在多目标强化学习中，代理商学会通过目标条件的政策实现多个目标。在学习期间，代理首先将轨迹收集到重放缓冲区中，稍后会随机选择这些轨迹以进行重放。但是，重播缓冲区中的实现目标通常偏向行为策略。从贝叶斯角度来看，当没有关于目标目标分配的先验知识时，代理人应该从多元化的目标中统一学习。因此，我们首先提出了一种基于加权熵的新型多目标RL目标。这一目标鼓励代理人最大限度地提高预期回报，以及实现更多样化的目标。其次，我们开发了最大基于熵的优先级框架，以优化所提出的目标。为了评估本框架，我们将它与深度确定性政策梯度相结合，无论是有还是没有后勤体验重放。在一组Openai健身房的多目标机器人任务上，我们将我们的方法与其他基线进行比较，并显示出于性能和样品效率的有希望的改进。

著录项

来源
《International Conference on Machine Learning》|2019年|12676-13290p|共14页
会议地点
作者
Rui Zhao; Xudong Sun; Volker Tresp;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词

相似文献

外文文献
中文文献
专利

1. Inferring Continuous Treatment Doses from Historical Data via Model-Based Entropy-Regularized Reinforcement Learning [J] . Jianxun Wang, David Roberts, Andinet Enquobahrie JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：通过基于模型的熵 - 正规化的加强学习推断从历史数据推断连续治疗剂量
2. Guided goal generation for hindsight multi-goal reinforcement learning [J] . Bai Chenjia, Liu Peng, Zhao Wei, Neurocomputing . 2019,第SEPa24期

机译：指导目标生成，用于事后洞察多目标强化学习
3. Guided goal generation for hindsight multi-goal reinforcement learning [J] . Bai Chenjia, Liu Peng, Zhao Wei, Neurocomputing . 2019,第Sepa24期

机译：后敏捷多目标强化学习的导游目标
4. Maximum Entropy-Regularized Multi-Goal Reinforcement Learning [C] . Rui Zhao, Xudong Sun, Volker Tresp International Conference on Machine Learning . 2019

机译：最大熵 - 正规化的多目标强化学习
5. Acquiring Diverse Robot Skills via Maximum Entropy Deep Reinforcement Learning [D] . Haarnoja, Tuomas. 2018

机译：通过最大熵深度强化学习掌握各种机器人技能
6. Maximum Power Point Tracking of Photovoltaic System Based on Reinforcement Learning [O] . Kuan-Yu Chou, Shu-Ting Yang, Yon-Ping Chen 2019

机译：基于强化学习的光伏系统最大功率点跟踪
7. Learning the Car-following Behavior of Drivers Using Maximum Entropy Deep Inverse Reinforcement Learning [O] . Yang Zhou, Rui Fu, Chang Wang 2020

机译：使用最大熵深度逆钢筋学习驾驶员的汽车跟踪行为

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅