Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

Zhi-xiong XU; Lei CAO; Xi-liang CHEN; Chen-xi LI; Yong-liang ZHANG; Jun LAI

首页> 外文期刊>IEICE transactions on information and systems >Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

【24h】

Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

机译：使用Sarsa和Q学习进行深度强化学习：一种混合方法

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The commonly used Deep Q Networks is known to overestimate action values under certain conditions. It's also proved that overestimations do harm to performance, which might cause instability and divergence of learning. In this paper, we present the Deep Sarsa and Q Networks (DSQN) algorithm, which can considered as an enhancement to the Deep Q Networks algorithm. First, DSQN algorithm takes advantage of the experience replay and target network techniques in Deep Q Networks to improve the stability of neural networks. Second, double estimator is utilized for Q-learning to reduce overestimations. Especially, we introduce Sarsa learning to Deep Q Networks for removing overestimations further. Finally, DSQN algorithm is evaluated on cart-pole balancing, mountain car and lunarlander control task from the OpenAI Gym. The empirical evaluation results show that the proposed method leads to reduced overestimations, more stable learning process and improved performance.

机译：众所周知，常用的Deep Q Networks在某些情况下会高估动作值。还证明了高估确实会损害绩效，这可能会导致学习的不稳定和分歧。在本文中，我们提出了Deep Sarsa和Q网络（DSQN）算法，可以将其视为对Deep Q Networks算法的增强。首先，DSQN算法利用Deep Q网络中的经验重播和目标网络技术来提高神经网络的稳定性。其次，将双重估计器用于Q学习以减少过高估计。特别是，我们将Sarsa学习引入Deep Q Networks，以进一步消除高估。最后，在OpenAI Gym的手杖平衡，山地车和月球人控制任务上评估了DSQN算法。实证评估结果表明，所提出的方法减少了过高的估计，使学习过程更加稳定，并提高了性能。

著录项

来源
《IEICE transactions on information and systems》 |2018年第9期|共8页
作者
Zhi-xiong XU; Lei CAO; Xi-liang CHEN; Chen-xi LI; Yong-liang ZHANG; Jun LAI;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Backward Q-learning: The combination of Sarsa algorithm and Q-learning [J] . Yin-Hao Wang, Tzuu-Hseng S. Li, Chih-Jui Lin Engineering Applications of Artificial Intelligence . 2013,第9期

机译：向后Q学习：Sarsa算法和Q学习的结合
2. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning [J] . Shady A. Maged, Bishoy H. Mikhail International journal of computational vision and robotics . 2020,第3期

机译：使用政策梯度优化和Q-Learning避免深增强学习碰撞
3. A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning [J] . Yuan Yinlong, Yu Zhu Liang, Gu Zhenghui, Knowledge-Based Systems . 2019,第JULa1期

机译：一种新型的多步Q学习方法，可提高深度强化学习的数据效率
4. Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning [C] . Fuxiao Tan1, Pengfei Yan, Xinping Guan International conference on neural information processing . 2017

机译：深度强化学习：从Q学习到深度Q学习
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning [O] . Shota Ohnishi, Eiji Uchibe, Yotaro Yamaguchi, 2019

机译：受约束的深度Q学习逐渐接近普通Q学习
7. Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach [O] . Zhi-xiong XU, Lei CAO, Xi-liang CHEN, 2018

机译：SARSA和Q-Learning的深增强学习：一种混合方法

Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach

摘要

著录项

相似文献

相关主题

期刊订阅