Deep Reinforcement Learning with Robust Deep Deterministic Policy Gradient

机译：强化深度决定性政策梯度的深度增强学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on searching the correct hyperparameters for the current task. DDPG algorithm risk overestimating the Q values in the critic (value) network. The accumulation of estimation errors as time elapse can result in the reinforcement agent trapping into a local optimum or suffering from disastrous forgetting. Twin Delayed DDPG (TD3) mitigated the overestimation bias problem but might not exploit full performance due to underestimation bias. In this paper Twin Average Delayed DDPG (TAD3) is proposed for specific adaption to TD3 and shows that the resulting algorithm perform better than TD3 in a challenging continuous control environment.

机译：最近，深度确定性政策梯度（DDPG）是一种流行的深度增强学习算法，适用于自动驾驶和机器人等连续控制问题。虽然DDPG可以产生非常好的结果，但它有其缺点。 DDPG可能变得不稳定，严重依赖于搜索当前任务的正确的超参数。 DDPG算法风险估计批评（价值）网络中的Q值。随着时间流逝的估计误差的积累可能导致加强剂陷入局部最佳或遭受灾难性的遗忘。双延迟DDPG（TD3）减轻了高估偏差问题，但可能不会因低估偏差而利用完全性能。在本文中，提出了对TD3的特定适应的双平均延迟DDPG（TAD3），并表明所得算法在具有挑战性的连续控制环境中执行优于TD3。

著录项

来源
《International Conference on Electrical, Control and Instrumentation Engineering》|2020年|1-5|共5页
会议地点
作者
Teckchai Tiong; Ismail Saad; Kenneth Tze Kin Teo; Herwansyah bin Lago;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Reinforcement learning; Robots; Training; Legged locomotion; Probability distribution; Approximation algorithms; Measurement uncertainty;

机译：加强学习;机器人;训练;腿运动;概率分布;近似算法;测量不确定性;

相似文献

外文文献
中文文献
专利

1. Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm [J] . Junta Wu, Huiyun Li Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：具有多种深度确定性政策梯度算法的深度集成钢筋学习
2. Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation [J] . Haeun Yoo, Boeun Kim, Jong Woo Kim, Computers & Chemical Engineering . 2021,第Jana4期

机译：基于跨越蒙特 - 卡洛深度确定性政策梯度的批量学习基于批处理流程的最优控制
3. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning [J] . Shady A. Maged, Bishoy H. Mikhail International journal of computational vision and robotics . 2020,第3期

机译：使用政策梯度优化和Q-Learning避免深增强学习碰撞
4. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient [C] . Shihui Li, Yi Wu, Xinyue Cui, AAAI Conference on Artificial Intelligence . 2019

机译：通过最低限度深度确定性政策梯度求解鲁棒多功能钢筋学习
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking [O] . Chujun Liu, Andrew G. Lonsberry, Mark J. Nandor, 2019

机译：控制动态双足行走的深度确定性策略梯度的实现
7. Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient [O] . Shihui Li, Yi Wu, Xinyue Cui, 2019

机译：稳健的多功能钢筋通过Minimax深度确定性政策梯度学习

Deep Reinforcement Learning with Robust Deep Deterministic Policy Gradient

摘要

著录项

相似文献

相关主题

期刊订阅