首页> 外文会议>International Conference on Electrical, Control and Instrumentation Engineering >Deep Reinforcement Learning with Robust Deep Deterministic Policy Gradient
【24h】

Deep Reinforcement Learning with Robust Deep Deterministic Policy Gradient

机译:强化深度决定性政策梯度的深度增强学习

获取原文

摘要

Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on searching the correct hyperparameters for the current task. DDPG algorithm risk overestimating the Q values in the critic (value) network. The accumulation of estimation errors as time elapse can result in the reinforcement agent trapping into a local optimum or suffering from disastrous forgetting. Twin Delayed DDPG (TD3) mitigated the overestimation bias problem but might not exploit full performance due to underestimation bias. In this paper Twin Average Delayed DDPG (TAD3) is proposed for specific adaption to TD3 and shows that the resulting algorithm perform better than TD3 in a challenging continuous control environment.
机译:最近,深度确定性政策梯度(DDPG)是一种流行的深度增强学习算法,适用于自动驾驶和机器人等连续控制问题。虽然DDPG可以产生非常好的结果,但它有其缺点。 DDPG可能变得不稳定,严重依赖于搜索当前任务的正确的超参数。 DDPG算法风险估计批评(价值)网络中的Q值。随着时间流逝的估计误差的积累可能导致加强剂陷入局部最佳或遭受灾难性的遗忘。双延迟DDPG(TD3)减轻了高估偏差问题,但可能不会因低估偏差而利用完全性能。在本文中,提出了对TD3的特定适应的双平均延迟DDPG(TAD3),并表明所得算法在具有挑战性的连续控制环境中执行优于TD3。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号