首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Latent Space Policies for Hierarchical Reinforcement Learning
【24h】

Latent Space Policies for Hierarchical Reinforcement Learning

机译:分层强化学习的潜在空间策略

获取原文
           

摘要

We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer’s policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.
机译:我们解决了学习分层深度神经网络策略以进行强​​化学习的问题。与显式限制或削弱层次结构的下层以迫使他们使用更高级别的调制信号的方法相反,我们框架中的每一层都经过训练以直接解决任务,但是通过最大熵增强来获得一系列多样的策略学习目标。每层还增加了潜在的随机变量,这些随机变量是在训练该层时从先前的分布中采样的。最大熵目标将这些潜在变量合并到该层的策略中,并且较高层可以通过此潜在空间直接控制较低层的行为。此外,通过将从潜在变量到动作的映射约束为可逆的,高层保留了完整的表达能力:高层和较低层的行为均不受约束。我们的实验评估表明,我们可以通过添加额外的层来提高标准基准任务的单层策略的性能,并且我们的方法可以通过学习基于高熵的高级策略来解决更复杂的稀疏奖励任务。针对简单的低级目标优化的技能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号