Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

Dilokthanakul Nat; Kaplanis Christos; Pawlowski Nick; Shanahan Murray

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

【24h】

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

机译：特征控制作为分层强化学习的内在动机

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

One of the main concerns of deep reinforcement learning (DRL) is the data inefficiency problem, which stems both from an inability to fully utilize data acquired and from naive exploration strategies. In order to alleviate these problems, we propose a DRL algorithm that aims to improve data efficiency via both the utilization of unrewarded experiences and the exploration strategy by combining ideas from unsupervised auxiliary tasks, intrinsic motivation, and hierarchical reinforcement learning (HRL). Our method is based on a simple HRL architecture with a metacontroller and a subcontroller. The subcontroller is intrinsically motivated by the metacontroller to learn to control aspects of the environment, with the intention of giving the agent: 1) a neural representation that is generically useful for tasks that involve manipulation of the environment and 2) the ability to explore the environment in a temporally extended manner through the control of the metacontroller. In this way, we reinterpret the notion of pixel- and feature-control auxiliary tasks as reusable skills that can be learned via an intrinsic reward. We evaluate our method on a number of Atari 2600 games. We found that it outperforms the baseline in several environments and significantly improves performance in one of the hardest games-Montezuma's revenge-for which the ability to utilize sparse data is key. We found that the inclusion of intrinsic reward is crucial for the improvement in the performance and that most of the benefit seems to be derived from the representations learned during training.

机译：深度强化学习（DRL）的主要问题之一是数据效率低下问题，其原因在于无法充分利用所获取的数据和天真的探索策略。为了缓解这些问题，我们提出了一种DRL算法，该算法旨在通过结合来自无人监督的辅助任务，内在动机和分层强化学习（HRL）的思想，通过利用未获得的经验和探索策略来提高数据效率。我们的方法基于带有元控制器和子控制器的简单HRL体系结构。子控制器从本质上受到元控制器的激励，学习控制环境的各个方面，目的是给代理：1）一种神经表示形式，通常对涉及环境操纵的任务有用； 2）能够探索环境的能力。通过元控制器的控制在时间上扩展环境。这样，我们将像素控制和特征控制辅助任务的概念重新解释为可通过内在奖励学习的可重用技能。我们在许多Atari 2600游戏中评估了我们的方法。我们发现，它在几种环境中的性能均优于基准，并在最难的游戏之一（蒙特祖玛的复仇）中显着提高了性能，而稀疏数据对于游戏来说至关重要。我们发现，内在奖励对于改善绩效至关重要，而且大多数好处似乎都源于在培训期间学到的表象。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2019年第11期|3409-3418|共10页
作者
Dilokthanakul Nat; Kaplanis Christos; Pawlowski Nick; Shanahan Murray;
展开▼
作者单位

Imperial Coll London Comp Dept London SW7 2AZ England|Vidyasirimedhi Inst Sci & Technol Rayong 21210 Thailand;

Imperial Coll London Comp Dept London SW7 2AZ England;

Imperial Coll London Comp Dept London SW7 2AZ England|DeepMind London N1C 4AG England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Task analysis; Reinforcement learning; Training; Neural networks; Visualization; Trajectory; Learning systems; Auxiliary task; deep reinforcement learning (DRL); hierarchical reinforcement learning (HRL); intrinsic motivation;

机译：任务分析;强化学习;训练;神经网络;可视化;弹道;学习系统;辅助任务;深度强化学习（DRL）;分层强化学习（HRL）;内在动机;

相似文献

外文文献
中文文献
专利

1. Intrinsic Motivation and Introspection in Reinforcement Learning [J] . Merrick K. E. Autonomous Mental Development, IEEE Transactions on . 2012,第4期

机译：强化学习的内在动机与内省
2. The effects of self-controlled and instructor-controlled feedback on motor learning and intrinsic motivation among novice adolescent taekwondo players [J] . Reza Goudini, Saeed Ashrafpoornavaee, Alireza Farsi Acta Gymnica . 2019,第1期

机译：自我控制和教练控制的反馈对青少年跆拳道运动员运动学习和内在动机的影响
3. Intrinsic Motivation Based Hierarchical Exploration for Model and Skill Learning [J] . Progress in Artificial Intelligence . 2020,第2期

机译：基于内在动机的模型与技能学习分层探索
4. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation [C] . Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Annual conference on Neural Information Processing Systems . 2016

机译：分层深度强化学习：整合时间抽象和内在动机
5. Hierarchical reinforcement learning with function approximation for adaptive control. [D] . Skelly, Margaret Mary. 2004

机译：具有自适应控制功能逼近的分层强化学习。
6. Correction: Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning [O] . Kristoffer Carl Aberg, Kimberly C. Doell, Sophie Schwartz -1

机译：纠正：将个人学习风格与避免方法的动机特征和强化学习的计算方面联系起来
7. Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning [O] . Nat Dilokthanakul, Christos Kaplanis, Nick Pawlowski, 2019

机译：分级钢筋学习的内在动机特征控制

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅