首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
【24h】

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

机译:特征控制作为分层强化学习的内在动机

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

One of the main concerns of deep reinforcement learning (DRL) is the data inefficiency problem, which stems both from an inability to fully utilize data acquired and from naive exploration strategies. In order to alleviate these problems, we propose a DRL algorithm that aims to improve data efficiency via both the utilization of unrewarded experiences and the exploration strategy by combining ideas from unsupervised auxiliary tasks, intrinsic motivation, and hierarchical reinforcement learning (HRL). Our method is based on a simple HRL architecture with a metacontroller and a subcontroller. The subcontroller is intrinsically motivated by the metacontroller to learn to control aspects of the environment, with the intention of giving the agent: 1) a neural representation that is generically useful for tasks that involve manipulation of the environment and 2) the ability to explore the environment in a temporally extended manner through the control of the metacontroller. In this way, we reinterpret the notion of pixel- and feature-control auxiliary tasks as reusable skills that can be learned via an intrinsic reward. We evaluate our method on a number of Atari 2600 games. We found that it outperforms the baseline in several environments and significantly improves performance in one of the hardest games-Montezuma's revenge-for which the ability to utilize sparse data is key. We found that the inclusion of intrinsic reward is crucial for the improvement in the performance and that most of the benefit seems to be derived from the representations learned during training.
机译:深度强化学习(DRL)的主要问题之一是数据效率低下问题,其原因在于无法充分利用所获取的数据和天真的探索策略。为了缓解这些问题,我们提出了一种DRL算法,该算法旨在通过结合来自无人监督的辅助任务,内在动机和分层强化学习(HRL)的思想,通过利用未获得的经验和探索策略来提高数据效率。我们的方法基于带有元控制器和子控制器的简单HRL体系结构。子控制器从本质上受到元控制器的激励,学习控制环境的各个方面,目的是给代理:1)一种神经表示形式,通常对涉及环境操纵的任务有用; 2)能够探索环境的能力。通过元控制器的控制在时间上扩展环境。这样,我们将像素控制和特征控制辅助任务的概念重新解释为可通过内​​在奖励学习的可重用技能。我们在许多Atari 2600游戏中评估了我们的方法。我们发现,它在几种环境中的性能均优于基准,并在最难的游戏之一(蒙特祖玛的复仇)中显着提高了性能,而稀疏数据对于游戏来说至关重要。我们发现,内在奖励对于改善绩效至关重要,而且大多数好处似乎都源于在培训期间学到的表象。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号