首页> 外文会议>International Joint Conference on Neural Networks >Efficient and Scalable Exploration via Estimation-Error
【24h】

Efficient and Scalable Exploration via Estimation-Error

机译:通过估计误差进行有效且可扩展的探索

获取原文

摘要

Exploring efficiently in complex environments is still a challenging problem in reinforcement learning. Recent exploration algorithms based on "optimism in the face of uncertainty" or intrinsic motivation achieved promising performance in sparse reward settings, but they often rely on additional structures which are hard to build in large scale problems. It renders them impractical and hinders the process of combining with reinforcement learning algorithms. Hence, the most state-of-the-art RL algorithms still use the naive action space noise as exploration strategy. In this paper, we model the uncertainty about environment through agent’s ability to estimate the value across state and action space. Then, we parameterize the uncertainty by a neural network and regard it as a reward bonus signal to reward uncertain states. In this way, we generate an end-to-end bonus which can scale to complex environments with less computational cost. In order to prove the effectiveness of our method, we evaluate it on the challenging Atari 2600 games. We observed that our method achieves superior or comparable exploratory performance compared to action space noise in all environments, including environments whose rewards are sparse. The results demonstrate that our exploration method can motivate agent to explore effectively even in complex environments and it generally outperforms the naive action space noise.
机译:在复杂的环境中有效地探索仍然是强化学习中的一个具有挑战性的问题。最近的探索算法基于“面对不确定性”的“乐观”或内在动机在稀疏奖励设置中取得了有希望的性能,但它们往往依赖于额外的结构,这些结构难以在大规模问题中建立。它使它们变得不切实际并阻碍了与加强学习算法相结合的过程。因此,最先进的RL算法仍然使用天真的动作空间噪声作为探索策略。在本文中,我们通过代理能力估算跨状态和动作空间的价值的能力来模拟环境的不确定性。然后,我们通过神经网络参数化不确定性,并将其视为奖励奖金信号以奖励不确定状态。通过这种方式,我们生成端到端奖金,可以扩展到具有较少计算成本的复杂环境。为了证明我们方法的有效性,我们在挑战的Atari 2600游戏中评估它。我们观察到,与所有环境中的动作空间噪声相比,我们的方法达到了优越或相当的探索性能,包括奖励稀疏的环境。结果表明,即使在复杂的环境中,我们的探索方法也可以激励代理商有效地探索,并且它通常优于天真的动作空间噪声。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号