...
首页> 外文期刊>Knowledge-Based Systems >A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning
【24h】

A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

机译:一种新型的多步Q学习方法,可提高深度强化学习的数据效率

获取原文
获取原文并翻译 | 示例
           

摘要

Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in solving real-world problems. To improve the data efficiency of DRL, in this paper, a new multi-step method is proposed. Unlike traditional algorithms, the proposed method uses a new return function, which alters the discount of future rewards while decreasing the impact of the immediate reward when selecting the current state action. This approach has the potential to improve the efficiency of reward data. By combining the proposed method with classic DRL algorithms, deep Q-networks (DQN) and double deep Q-networks (DDQN), two novel algorithms are proposed for improving the efficiency of learning from experience replay. The performance of the proposed algorithms, expected n-step DQN (EnDQN) and expected n-step DDQN (EnDDQN), are validated using two simulation environments, CartPole and DeepTraffic. The experimental results demonstrate that the proposed multi-step methods greatly improve the data efficiency of DRL agents while further improving the performance of existing classic DRL algorithms when incorporated into their training. (C) 2019 Elsevier B.V. All rights reserved.
机译:具有经验重播的深度强化学习(DRL)算法已用于解决许多顺序学习问题。但是,在实践中,DRL算法仍然存在数据效率低下的问题,这限制了它们在许多情况下的适用性,并使其在解决现实问题中效率低下。为了提高DRL的数据效率,提出了一种新的多步法。与传统算法不同,所提出的方法使用新的返回函数,该函数在选择当前状态动作时更改未来奖励的折扣,同时减少立即奖励的影响。这种方法有可能提高奖励数据的效率。通过将提出的方法与经典的DRL算法,深度Q网络(DQN)和双深度Q网络(DDQN)相结合,提出了两种新颖的算法来提高从经验重放中学习的效率。使用两个仿真环境CartPole和DeepTraffic验证了所提出算法的预期n步DQN(EnDQN)和预期n步DDQN(EnDDQN)的性能。实验结果表明,所提出的多步方法极大地提高了DRL代理的数据效率,同时进一步提高了将现有经典DRL算法集成到其训练中的性能。 (C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Knowledge-Based Systems》 |2019年第1期|107-117|共11页
  • 作者单位

    South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

    South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

    South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

    Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Guangdong, Peoples R China;

    South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

    South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

    South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

    South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Deep reinforcement learning; Robotics; Multi-step methods; Data efficiency;

    机译:深度强化学习;机器人;多步法;数据效率;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号