首页> 外文会议>IEEE/RSJ International Conference on Intelligent Robots and Systems >Exploration Strategy based on Validity of Actions in Deep Reinforcement Learning
【24h】

Exploration Strategy based on Validity of Actions in Deep Reinforcement Learning

机译:基于深度加强学习中行动有效性的探索战略

获取原文

摘要

How to explore environments is one of the most critical factors for the performance of an agent in reinforcement learning. Conventional exploration strategies such as ε-greedy algorithm and Gaussian exploration noise simply depend on pure randomness. However, it is required for an agent to consider its training progress and long-term usefulness of actions to efficiently explore complex environments, which remains a major challenge in reinforcement learning. To address this challenge, we propose a novel exploration method that selects actions based on their validity. The key idea behind our method is to estimate the validity of actions by leveraging zero avoiding property of kullback-leibler divergence to comprehensively evaluate actions in terms of both exploration and exploitation. We also introduce a framework that allows an agent to explore efficiently in environments where reward is sparse or cannot be defined intuitively. The framework uses expert demonstrations to guide an agent to visit task-relevant state space by combining our exploration strategy with imitation learning. We demonstrate our exploration strategy on several tasks ranging from classical control tasks to high-dimensional urban autonomous driving scenarios at roundabout. The results show that our exploration strategy encourages an agent to visit task-relevant state space to enhance validity of actions, outperforming several previous methods.
机译:如何探索环境是在加固学习中表现代理人的最关键因素之一。诸如ε-贪婪算法和高斯勘探噪声之类的常规勘探策略简单地取决于纯粹的随机性。但是,代理人需要考虑其培训进度和长期有效性,以有效地探索复杂的环境,这仍然是加强学习中的主要挑战。为了解决这一挑战,我们提出了一种新的探索方法,可以根据其有效性选择措施。我们的方法背后的关键思想是通过利用零避免Kullback-Leibler分歧的财产来估算行动的有效性,以综合评估勘探和剥削的行为。我们还介绍了一个框架,允许代理在奖励稀疏或无法直观地定义的环境中有效探索。该框架使用专家演示来指导代理人通过将我们的勘探策略与模仿学习相结合来访问任务相关的国家空间。我们展示了对几种任务的探索战略,从古典控制任务到环形交叉路口的高维城市自主驾驶场景。结果表明,我们的探索战略鼓励代理人访问任务相关的国家空间,以提高行动的有效性,优于以前的几种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号