首页> 外文期刊>Expert systems with applications >Improving exploration efficiency of deep reinforcement learning through samples produced by generative model
【24h】

Improving exploration efficiency of deep reinforcement learning through samples produced by generative model

机译:通过生成模型生产的样本提高深度增强学习的探索效率

获取原文
获取原文并翻译 | 示例

摘要

Deep reinforcement learning (DRL) has made remarkable achievements in artificial intelligence. However, it relies on stochastic exploration that suffers from low efficiency, especially in the early learning stages, of which the time complexity is nearly exponential. To solve the problem, an algorithm, referred to as Generative Action Selection through Probability (GRASP), is proposed to improve exploration in reinforcement learning. The primary insight is to reshape exploration spaces to limit the choice of exploration behaviors. More specifically, GRASP trains a generator to generate the exploration spaces from demonstrations by generative adversarial network (GAN). And then the agent selects actions from new exploration spaces via modified epsilon-greedy algorithm to incorporate GRASP with existing standard deep reinforcement learning algorithms. Experiment results showed that deep reinforcement learning equipped with GRASP demonstrated significant improvements in simulated environments.
机译:深度加强学习(DRL)在人工智能方面取得了显着成就。然而,它依赖于随机勘探,这些勘探效率低,特别是在早期学习阶段,其中时间复杂性几乎是指数。为了解决问题,提出了一种通过概率(掌握)称为生成动作选择的算法,以改善加强学习的探索。主要洞察力是重塑勘探空间以限制勘探行为的选择。更具体地,掌握将发电机培训,以通过生成的对抗网络(GaN)从演示生成勘探空间。然后,代理通过修改的epsilon-贪婪算法从新探索空间中选择动作,并与现有的标准深度加强学习算法合并。实验结果表明,配备掌握的深度增强学习表现出模拟环境的显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号