Random curiosity-driven exploration in deep reinforcement learning

Li Jing; Shi Xinxin; Li Jiehao; Zhang Xin; Wang Junzheng

首页> 外文期刊>Neurocomputing >Random curiosity-driven exploration in deep reinforcement learning

【24h】

Random curiosity-driven exploration in deep reinforcement learning

机译：深度加固学习中的随机效果驱动探索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning (RL) depends on carefully engineering environment rewards. However, rewards from environments are extremely sparse for many RL tasks, challenging for the agent to learn skills and interact with the environment. One solution to this problem is to create intrinsic rewards for agents and to make rewards dense and more suitable for learning. Recent algorithms, such as curiosity-driven explo-ration, usually estimate the novelty of the next state through the prediction error of dynamics models. However, these methods are typically limited by the capacity of their dynamics models. In this paper, a random curiosity-driven model using deep reinforcement learning is proposed, which uses a target network with fixed weights to maintain the stability of dynamics models and create more suitable intrinsic rewards. We integrate the parametric exploration method for further promoting sufficient exploration. Besides, a deeper and more closely connected network is utilized for encoding the pixel images for policy-gradient. By comparing our method against the previous approaches in several environments, the experiments show that our method achieves state-of-the-art performance on most but not all of the Atari games. (c) 2020 Elsevier B.V. All rights reserved.

机译：强化学习（RL）取决于仔细的工程环境奖励。然而，对于许多RL任务来说，来自环境的奖励非常稀少，挑战代理学习技能并与环境互动。解决此问题的一个解决方案是为代理商创造内在奖励，并使奖励密集，更适合学习。最近的算法，如好奇心驱动的爆炸，通常通过动力学模型的预测误差来估计下一个状态的新颖性。然而，这些方法通常受到动态模型的能力的限制。在本文中，提出了一种使用深增强学习的随机效果驱动模型，其使用具有固定权重的目标网络来维持动力学模型的稳定性并产生更合适的内在奖励。我们整合了参数探索方法以进一步促进充分的探索。此外，利用更深入的和更接近的网络用于对策略梯度进行编码。通过将我们的方法与若干环境中以前的方法进行比较，实验表明，我们的方法在大多数情况下实现最先进的性能，而不是所有的Atari游戏。（c）2020 Elsevier B.v.保留所有权利。

著录项

来源
《Neurocomputing》 |2020年第22期|139-147|共9页
作者
Li Jing; Shi Xinxin; Li Jiehao; Zhang Xin; Wang Junzheng;
展开▼
作者单位

Beijing Inst Technol State Key Lab Intelligent Control & Decis Complex Beijing 100081 Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep reinforcement learning; Curiosity-driven exploration; Intrinsic rewards;

机译：深增强学习;好奇心驱动的探索;内在奖励;

相似文献

外文文献
中文文献
专利

1. An Edge Computing Framework for Powertrain Control System Optimization of Intelligent and Connected Vehicles Based on Curiosity-Driven Deep Reinforcement Learning [J] . Hu Bo, Li Jiaxi IEEE Transactions on Industrial Electronics . 2021,第8期

机译：基于好奇心驱动的深层加固学习的智能和连通车辆动力传动控制系统优化的优势计算框架
2. A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning [J] . Chen Wenbai, Shi Haobin, Li Jingchen, International Journal of Fuzzy Systems . 2021,第5期

机译：一种模糊的多功能加固学习机构
3. An information-theoretic approach to curiosity-driven reinforcement learning [J] . Still S., Precup D. Theory in biosciences . 2012,第3期

机译：好奇心驱动的强化学习的信息理论方法
4. Attention-Based Curiosity-Driven Exploration in Deep Reinforcement Learning [C] . Patrik Reizinger, Márton Szemenyei IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：深度强化学习中基于注意力的好奇心驱动探索
5. Exploration and Safety in Deep Reinforcement Learning [D] . Achiam, Joshua S. 2021

机译：深增强学习中的探索与安全
6. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor [O] . Xuhan Liu, Kai Ye, Herman W. T. van Vlijmen, 2019

机译：探索策略通过深度强化学习来改善从头配体的多样性：腺苷A2A受体的情况
7. Exploration with Multiple Random ε-Buffers in Off-Policy Deep Reinforcement Learning [O] . 2019

机译：在禁止政策深度增强学习中的多种随机ε-缓冲区的探索

Random curiosity-driven exploration in deep reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅