首页> 外文会议>SAE New Energy Intelligent Connected Vehicle Technology Conference >Autopilot Strategy Based on Improved DDPG Algorithm
【24h】

Autopilot Strategy Based on Improved DDPG Algorithm

机译:基于改进DDPG算法的自动驾驶仪策略

获取原文
获取外文期刊封面目录资料

摘要

Deep Deterministic Policy Gradient (DDPG) is one of the Deep Reinforcement Learning algorithms. Because of the well perform in continuous motion control, DDPG algorithm is applied in the field of self-driving. Regarding the problems of the instability of DDPG algorithm during training and low training efficiency and slow convergence rate. An improved DDPG algorithm based on segmented experience replay is presented. On the basis of the DDPG algorithm, the segmented experience replay select the training experience by the importance according to the training progress to improve the training efficiency and stability of the training model. The algorithm was tested in an open source 3D car racing simulator called TORCS. The simulation results demonstrate the training stability is significantly improved compared with the DDPG algorithm and the DQN algorithm, and the average return is about 46% higher than the DDPG algorithm and about 55% higher than the DQN algorithm.
机译:深度确定性政策梯度(DDPG)是深度加强学习算法之一。由于在连续运动控制中执行良好,DDPG算法应用于自动驾驶领域。关于训练中DDPG算法不稳定性的问题,训练效率低,收敛速度慢。提出了一种基于分段体验重放的改进的DDPG算法。在DDPG算法的基础上,分段体验重播根据培训进展以提高培训模型的培训效率和稳定性,通过重要性选择培训体验。该算法在一个名为TORC的开源3D赛车赛车模拟器中进行了测试。与DDPG算法和DQN算法相比,仿真结果证明了训练稳定性显着提高,平均返回比DDPG算法高约46%,比DQN算法高约55%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号