...
首页> 外文期刊>Neurocomputing >Random curiosity-driven exploration in deep reinforcement learning
【24h】

Random curiosity-driven exploration in deep reinforcement learning

机译:深度加固学习中的随机效果驱动探索

获取原文
获取原文并翻译 | 示例
           

摘要

Reinforcement learning (RL) depends on carefully engineering environment rewards. However, rewards from environments are extremely sparse for many RL tasks, challenging for the agent to learn skills and interact with the environment. One solution to this problem is to create intrinsic rewards for agents and to make rewards dense and more suitable for learning. Recent algorithms, such as curiosity-driven explo-ration, usually estimate the novelty of the next state through the prediction error of dynamics models. However, these methods are typically limited by the capacity of their dynamics models. In this paper, a random curiosity-driven model using deep reinforcement learning is proposed, which uses a target network with fixed weights to maintain the stability of dynamics models and create more suitable intrinsic rewards. We integrate the parametric exploration method for further promoting sufficient exploration. Besides, a deeper and more closely connected network is utilized for encoding the pixel images for policy-gradient. By comparing our method against the previous approaches in several environments, the experiments show that our method achieves state-of-the-art performance on most but not all of the Atari games. (c) 2020 Elsevier B.V. All rights reserved.
机译:强化学习(RL)取决于仔细的工程环境奖励。然而,对于许多RL任务来说,来自环境的奖励非常稀少,挑战代理学习技能并与环境互动。解决此问题的一个解决方案是为代理商创造内在奖励,并使奖励密集,更适合学习。最近的算法,如好奇心驱动的爆炸,通常通过动力学模型的预测误差来估计下一个状态的新颖性。然而,这些方法通常受到动态模型的能力的限制。在本文中,提出了一种使用深增强学习的随机效果驱动模型,其使用具有固定权重的目标网络来维持动力学模型的稳定性并产生更合适的内在奖励。我们整合了参数探索方法以进一步促进充分的探索。此外,利用更深入的和更接近的网络用于对策略梯度进行编码。通过将我们的方法与若干环境中以前的方法进行比较,实验表明,我们的方法在大多数情况下实现最先进的性能,而不是所有的Atari游戏。 (c)2020 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号