...
首页> 外文期刊>Robotics and Autonomous Systems >Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach
【24h】

Fixed-Wing UAVs flocking in continuous spaces: A deep reinforcement learning approach

机译:固定翼无人机在连续空间中植入:深度加强学习方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Fixed-Wing UAVs (Unmanned Aerial Vehicles) flocking is still a challenging problem due to the kinematics complexity and environmental dynamics. In this paper, we solve the leader-followers flocking problem using a novel deep reinforcement learning algorithm that can generate roll angle and velocity commands by training an end-to-end controller in continuous state and action spaces. Specifically, we choose CACLA (Continuous Actor-Critic Learning Automation) as the base algorithm and we use the multi-layer perceptron to represent both the actor and the critic. Besides, we further improve the learning efficiency by using the experience replay technique that stores the training data in the experience memory and samples from the memory as needed. We have compared the performance of the proposed CACER (Continuous Actor-Critic with Experience Replay) algorithm with benchmark algorithms such as DDPG and double DQN in numerical simulation, and we have demonstrated the performance of the learned optimal policy in semi-physical simulation without any parameter tuning. (C) 2020 Elsevier B.V. All rights reserved.
机译:由于运动学复杂性和环境动态,固定翼无人机(无人驾驶飞行器)植绒仍然是一个具有挑战性的问题。在本文中,我们通过新颖的深度加强学习算法解决了领导者追随者植入问题,该群体可以通过在连续状态和动作空间中训练端到端控制器来产生滚角和速度命令。具体而言,我们选择Cacla(连续演员 - 评论家学习自动化)作为基础算法,我们使用多层的Perceptron代表演员和批评者。此外,我们通过使用经验重放技术进一步提高了学习效率,该技术将培训数据存储在经验存储器中,并根据需要从内存中的样本中存储。我们已经将提议的Cacer(连续演员 - 评论家与体验重放)算法进行了比较了与数值模拟中的基准算法(如DDPG和Double DQN)的表现,我们已经证明了学习的最佳政策在半物理模拟中的表现而没有任何参数调整。 (c)2020 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号