Value-based Algorithms Optimization with Discounted Multiple-step Learning Method in Deep Reinforcement Learning

机译：基于价值的算法优化，在深增强学习中的折扣多步学习方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Value-based algorithms have been demonstrated on a range of deep reinforcement learning tasks. However, value-based algorithms suffer from challenges in the aspect of stability, overestimate and convergence, which limit severely the application of such algorithms in real-word environment. In n-step learning method, truncated N-step return was used as a part of multiple-step targets to make faster learning, and improve the defect of such algorithms, but it is still far away to practical application. In this paper, we proposed a straightforward optimal method — Discount Multiple-steps Learning Method (DMLM) to improve the performance of value-based algorithms by giving a discount factor to truncated N-step return which shows better results in our experiments. In this method, regard the discounted truncated N-step return rather than accumulated discount reward as the important part of target network when computing the TD-error as a loss function of evaluate network. In the experiment part, we perform experiments compare to value-based algorithms without this method, and prove this method can make more accurate predict of value function, thereby outperform other optimal methods in terms of stability, overestimate and convergence.

机译：基于价值的算法已经证明了一系列深度加强学习任务。然而，基于价值的算法在稳定性，高估和收敛方面存在挑战，这限制了这种算法在实际环境中的应用。在N步学习方法中，截断的N步返回被用作多步目标的一部分，以制定更快的学习，并提高这种算法的缺陷，但它仍然远离实际应用。在本文中，我们提出了一种直接的最佳方法 - 通过为我们的实验提供折扣因素来提高基于价值的算法的性能，提高基于价值的算法的性能。在此方法中，将折扣截断的N步骤返回，而不是累计折扣奖励作为目标网络的重要组成部分，当计算TD误差作为评估网络的丢失功能时。在实验部件中，我们执行实验与基于价值的算法进行比较，没有这种方法，并证明了这种方法可以更准确地预测价值函数，从而在稳定性，高估和收敛方面优于其他最佳方法。

著录项

来源
《International Conference on High Performance Computing and Communications;International Conference on Smart City;IEEE International Conference on Data Science and Systems》|2020年|979-984|共6页
会议地点
作者
Haibo Deng; Shiqun Yin; Xiaohong Deng; Shiwei Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Learning systems; Smoothing methods; High performance computing; Conferences; Reinforcement learning; Prediction algorithms; Task analysis;

机译：学习系统;平滑方法;高性能计算;会议;加固学习;预测算法;任务分析;

相似文献

外文文献
中文文献
专利

1. Heuristic algorithms based on deep reinforcement learning for quadratic unconstrained binary optimization [J] . Chen Ming, Chen Yuning, Du Yonghao, Knowledge-Based Systems . 2020,第Nova5期

机译：基于深度加强学习的高强度学习的启发式算法
2. Distributed Bayesian optimization of deep reinforcement learning algorithms [J] . M. Todd Young, Jacob D. Hinkle, Ramakrishnan Kannan, Journal of Parallel and Distributed Computing . 2020,第May期

机译：深增强学习算法的分布式贝叶斯优化
3. Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm [J] . Nesma M. Ashraf, Reham R. Mostafa, Rasha H. Sakr, PLoS One . 2021,第6期

机译：优化基于鲸井优化算法的自主驾驶深度加固学习超公绩
4. Reactive Power Optimization Using Feed Forward Neural Deep Reinforcement Learning Method : (Deep Reinforcement Learning DQN algorithm) [C] . Mazhar Ali, Asad Mujeeb, Hameed Ullah, Asia Energy and Electrical Engineering Symposium . 2020

机译：使用前馈神经深度强化学习方法进行无功优化：（深度强化学习DQN算法）
5. Sample-Efficient Nonconvex Optimization Algorithms in Machine Learning and Reinforcement Learning [D] . Xu, Pan. 2021

机译：机器学习和加固学习中的采样高效的非透露算法
6. Design Optimization of a Pneumatic Soft Robotic Actuator Using Model-Based Optimization and Deep Reinforcement Learning [O] . Mahsa Raeisinezhad, Nicholas Pagliocca, Behrad Koohbor, 2021

机译：基于模型的优化和深度加固学习的气动软机器人执行器设计优化
7. Value-Based Reinforcement Learning algorithms in Sparse Distributed Memories to solve the Mountain-Car Problem [O] . Francí i Rodon Arnau 2015

机译：稀疏分布内存中基于价值的强化学习算法，用于解决山地车问题

Value-based Algorithms Optimization with Discounted Multiple-step Learning Method in Deep Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅