首页> 外文期刊>Mathematical Problems in Engineering >A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters
【24h】

A Dynamic Adjusting Reward Function Method for Deep Reinforcement Learning with Adjustable Parameters

机译:具有可调节参数的深增强学习动态调整奖励功能方法

获取原文
获取原文并翻译 | 示例

摘要

In deep reinforcement learning, network convergence speed is often slow and easily converges to local optimal solutions. For an environment with reward saltation, we propose a magnify saltatory reward (MSR) algorithm with variable parameters from the perspective of sample usage. MSR dynamically adjusts the rewards for experience with reward saltation in the experience pool, thereby increasing an agent's utilization of these experiences. We conducted experiments in a simulated obstacle avoidance search environment of an unmanned aerial vehicle and compared the experimental results of deep Q-network (DQN), double DQN, and dueling DQN after adding MSR. The experimental results demonstrate that, after adding MSR, the algorithms exhibit a faster network convergence and can obtain the global optimal solution easily.
机译:在深度加强学习中,网络收敛速度往往慢速且容易收敛到本地最佳解决方案。对于具有奖励盐酸盐的环境,我们提出了一种从样品使用的视角下使用可变参数的放大素质奖励(MSR)算法。 MSR动态调整奖励以获得经验池中的奖励盐化经验,从而提高代理商的利用这些经验。我们在无人驾驶飞行器的模拟障碍物搜索环境中进行了实验,并在添加MSR后比较了深Q网(DQN),双DQN和Dueling DQN的实验结果。实验结果表明,在添加MSR之后,该算法表现出更快的网络收敛性,并且可以容易地获得全球最佳解决方案。

著录项

  • 来源
    《Mathematical Problems in Engineering》 |2019年第23期|7619483.1-7619483.10|共10页
  • 作者单位

    Northwestern Polytech Univ Sch Elect & Informat Xian 710129 Shaanxi Peoples R China;

    Northwestern Polytech Univ Sch Elect & Informat Xian 710129 Shaanxi Peoples R China;

    Northwestern Polytech Univ Sch Elect & Informat Xian 710129 Shaanxi Peoples R China;

    Northwestern Polytech Univ Sch Elect & Informat Xian 710129 Shaanxi Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号