首页> 中文期刊> 《计算机工程与应用》 >基于深度强化学习的移动机器人路径规划

基于深度强化学习的移动机器人路径规划

     

摘要

To solve the problem of slow convergence under the basic deep Q-Network with which the robot explores the complex and unknown environment, an improved deep double Q network algorithm(Improved Dueling Deep Double Q-Network, IDDDQN)based on dueling network structure is put forward. The mobile robot can estimate the state-action value function of its three actions through the improved DDQN network, update the network parameters and get the corresponding Q value through the training. With the combination of Boltzmann and ε-greedy adopted, the mobile robot chooses an optimal action, and reaches the next observation. It can also store the data into experience replay memory through network learning, and train the network with mini-batch data. According to the experiment results, the mobile robot using IDDDQN can quickly adapt to the unknown environment, the convergence speed of IDDDQN is improved, the success rate of reaching the target position adds up to more than three times, and the optimal path can also be gained in an unknown complex environment.%为解决传统的深度Q网络模型下机器人探索复杂未知环境时收敛速度慢的问题,提出了基于竞争网络结构的改进深度双Q 网络方法(Improved Dueling Deep Double Q-Network,IDDDQN).移动机器人通过改进的DDQN网络结构对其三个动作的值函数进行估计,并更新网络参数,通过训练网络得到相应的Q值.移动机器人采用玻尔兹曼分布与ε-greedy相结合的探索策略,选择一个最优动作,到达下一个观察.机器人将通过学习收集到的数据采用改进的重采样优选机制存储到缓存记忆单元中,并利用小批量数据训练网络.实验结果显示,与基本DDQN算法比,IDDDQN训练的机器人能够更快地适应未知环境,网络的收敛速度也得到提高,到达目标点的成功率增加了3倍多,在未知的复杂环境中可以更好地获取最优路径.

著录项

  • 来源
    《计算机工程与应用》 |2019年第13期|15-19,157|共6页
  • 作者单位

    School of Artificial Intelligence;

    Hebei University of Technology;

    Tianjin 300401;

    China 2.Hebei Provincial Key Laboratory of Big Data Computing;

    Hebei University of Technology;

    Tianjin 300401;

    China 3.Hebei University of Engineering;

    Handan;

    Hebei 056038;

    China;

    School of Artificial Intelligence;

    Hebei University of Technology;

    Tianjin 300401;

    China 2.Hebei Provincial Key Laboratory of Big Data Computing;

    Hebei University of Technology;

    Tianjin 300401;

    China 3.Hebei University of Engineering;

    Handan;

    Hebei 056038;

    China;

    School of Artificial Intelligence;

    Hebei University of Technology;

    Tianjin 300401;

    China 2.Hebei Provincial Key Laboratory of Big Data Computing;

    Hebei University of Technology;

    Tianjin 300401;

    China 3.Hebei University of Engineering;

    Handan;

    Hebei 056038;

    China;

    School of Artificial Intelligence;

    Hebei University of Technology;

    Tianjin 300401;

    China 2.Hebei Provincial Key Laboratory of Big Data Computing;

    Hebei University of Technology;

    Tianjin 300401;

    China 3.Hebei University of Engineering;

    Handan;

    Hebei 056038;

    China;

    School of Artificial Intelligence;

    Hebei University of Technology;

    Tianjin 300401;

    China 2.Hebei Provincial Key Laboratory of Big Data Computing;

    Hebei University of Technology;

    Tianjin 300401;

    China 3.Hebei University of Engineering;

    Handan;

    Hebei 056038;

    China;

  • 原文格式 PDF
  • 正文语种 chi
  • 中图分类 在其他方面的应用;
  • 关键词

    深度双Q网络(DDQN); 竞争网络结构; 重采样优选机制; 玻尔兹曼分布; ε-greedy策略;

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号