A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

Yuan Yinlong; Yu Zhu Liang; Gu Zhenghui; Yeboah Yao; Wei Wu; Deng Xiaoyan; Li Jingcong; Li Yuanqing

首页> 外文期刊>Knowledge-Based Systems >A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

【24h】

A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

机译：一种新型的多步Q学习方法，可提高深度强化学习的数据效率

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in solving real-world problems. To improve the data efficiency of DRL, in this paper, a new multi-step method is proposed. Unlike traditional algorithms, the proposed method uses a new return function, which alters the discount of future rewards while decreasing the impact of the immediate reward when selecting the current state action. This approach has the potential to improve the efficiency of reward data. By combining the proposed method with classic DRL algorithms, deep Q-networks (DQN) and double deep Q-networks (DDQN), two novel algorithms are proposed for improving the efficiency of learning from experience replay. The performance of the proposed algorithms, expected n-step DQN (EnDQN) and expected n-step DDQN (EnDDQN), are validated using two simulation environments, CartPole and DeepTraffic. The experimental results demonstrate that the proposed multi-step methods greatly improve the data efficiency of DRL agents while further improving the performance of existing classic DRL algorithms when incorporated into their training. (C) 2019 Elsevier B.V. All rights reserved.

机译：具有经验重播的深度强化学习（DRL）算法已用于解决许多顺序学习问题。但是，在实践中，DRL算法仍然存在数据效率低下的问题，这限制了它们在许多情况下的适用性，并使其在解决现实问题中效率低下。为了提高DRL的数据效率，提出了一种新的多步法。与传统算法不同，所提出的方法使用新的返回函数，该函数在选择当前状态动作时更改未来奖励的折扣，同时减少立即奖励的影响。这种方法有可能提高奖励数据的效率。通过将提出的方法与经典的DRL算法，深度Q网络（DQN）和双深度Q网络（DDQN）相结合，提出了两种新颖的算法来提高从经验重放中学习的效率。使用两个仿真环境CartPole和DeepTraffic验证了所提出算法的预期n步DQN（EnDQN）和预期n步DDQN（EnDDQN）的性能。实验结果表明，所提出的多步方法极大地提高了DRL代理的数据效率，同时进一步提高了将现有经典DRL算法集成到其训练中的性能。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Knowledge-Based Systems》 |2019年第1期|107-117|共11页
作者
Yuan Yinlong; Yu Zhu Liang; Gu Zhenghui; Yeboah Yao; Wei Wu; Deng Xiaoyan; Li Jingcong; Li Yuanqing;
展开▼
作者单位

South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Guangdong, Peoples R China;

South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

South China Univ Technol, Coll Automat Sci & Engn, Guangzhou 510641, Guangdong, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep reinforcement learning; Robotics; Multi-step methods; Data efficiency;

机译：深度强化学习;机器人;多步法;数据效率;

相似文献

外文文献
中文文献
专利

1. A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning [J] . Yuan Yinlong, Yu Zhu Liang, Gu Zhenghui, Knowledge-Based Systems . 2019,第Jul1期

机译：一种新型多步Q学习方法，提高深增强学习数据效率
2. SmartFCT: Improving power-efficiency for data center networks with deep reinforcement learning [J] . Sun Penghao, Guo Zehua, Liu Sen, Computer networks . 2020,第Octa9期

机译：SMARTFCT：提高具有深度增强学习的数据中心网络的功率效率
3. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning [J] . Shady A. Maged, Bishoy H. Mikhail International journal of computational vision and robotics . 2020,第3期

机译：使用政策梯度优化和Q-Learning避免深增强学习碰撞
4. Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning [C] . Fuxiao Tan1, Pengfei Yan, Xinping Guan International conference on neural information processing . 2017

机译：深度强化学习：从Q学习到深度Q学习
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Improving Emergency Department Efficiency by Patient Scheduling Using Deep Reinforcement Learning [O] . Seunghoon Lee, Young Hoon Lee 2020

机译：通过使用深度强化学习进行患者调度来提高急诊科效率
7. The Effect of Multi-step Methods on Overestimation in Deep Reinforcement Learning [O] . Lingheng Meng, Rob Gorbet, Dana Kulic 2021

机译：多步法对深增强学学习高估的影响

A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

摘要

著录项

相似文献

相关主题

期刊订阅