首页> 外文会议>International conference on simulation of adaptive behavior >Deep Reinforcement Learning with Risk-Seeking Exploration
【24h】

Deep Reinforcement Learning with Risk-Seeking Exploration

机译:寻求风险探索的深度强化学习

获取原文

摘要

In most contemporary work in deep reinforcement learning (DRL), agents are trained in simulated environments. Not only are simulated environments fast and inexpensive, they are also 'safe'. By contrast, training in a real world environment (using robots, for example) is not only slow and costly, but actions can also result in irreversible damage, either to the environment or to the agent (robot) itself. In this paper, we consider taking advantage of the inherent safety in computer simulation by extending the Deep Q-Network (DQN) algorithm with an ability to measure and take risk. In essence, we propose a novel DRL algorithm that encourages risk-seeking behaviour to enhance information acquisition during training. We demonstrate the merit of the exploration heuristic by (ⅰ) arguing that our risk estimator implicitly contains both parametric uncertainty and inherent uncertainty of the environment which are propagated back through Temporal Difference error across many time steps and (ⅱ) evaluating our method on three games in the Atari domain and showing that the technique works well on Montezuma's Revenge, a game that epitomises the challenge of sparse reward.
机译:在大多数当代的深度强化学习(DRL)工作中,代理商都是在模拟环境中接受训练的。模拟环境不仅快速,廉价,而且“安全”。相比之下,在现实环境中的训练(例如,使用机器人)不仅速度慢且成本高,而且动作也可能导致对环境或对代理(机器人)本身的不可逆转的破坏。在本文中,我们考虑通过扩展深度Q网络(DQN)算法并具有测量和承担风险的能力,来利用计算机仿真中的固有安全性。从本质上讲,我们提出了一种新颖的DRL算法,该算法可鼓励冒险行为,以增强训练过程中的信息获取。我们通过(ⅰ)争论我们的风险估算器隐含包含参数不确定性和环境固有的不确定性,并通过许多时间步长的时间差异误差传播回去,并(ⅱ)在三个游戏中评估我们的方法,从而证明了探索启发式方法的优点。在Atari领域进行了展示,并表明该技术在蒙特祖玛的复仇游戏中表现出色,该游戏集中体现了稀疏奖励的挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号