首页> 外文期刊>IEEE Robotics and Automation Letters >A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing
【24h】

A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing

机译:不完美感应下多UAV碰撞避免的两级加固学习方法

获取原文
获取原文并翻译 | 示例
           

摘要

Unlike autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs) have a higher dimensional configuration space, which makes the motion planning of multi-UAVs a challenging task. In addition, uncertainties and noises are more significant in UAV scenarios, which increases the difficulty of autonomous navigation for multi-UAV. In this letter, we proposed a two-stage reinforcement learning (RL) based multi-UAV collision avoidance approach without explicitly modeling the uncertainty and noise in the environment. Our goal is to train a policy to plan a collision-free trajectory by leveraging local noisy observations. However, the reinforcement learned collision avoidance policies usually suffer from high variance and low reproducibility, because unlike supervised learning, RL does not have a fixed training set with ground-truth labels. To address these issues, we introduced a two-stage training method for RL based collision avoidance. For the first stage, we optimize the policy using a supervised training method with a loss function that encourages the agent to follow the well-known reciprocal collision avoidance strategy. For the second stage, we use policy gradient to refine the policy. We validate our policy in a variety of simulated scenarios, and the extensive numerical simulations demonstrate that our policy can generate time-efficient and collision-free paths under imperfect sensing, and can well handle noisy local observations with unknown noise levels.
机译:与自主地面车辆(AGVS)不同,无人驾驶飞行器(无人机)具有更高的维度配置空间,这使得多无人机的运动规划成为一个具有挑战性的任务。此外,在无人机方案中,不确定性和噪音更为显着,这增加了多UAV自主导航的难度。在这封信中,我们提出了一种基于两阶段的加强学习(RL)的多UAV碰撞避免方法,而无需明确地建模环境中的不确定性和噪声。我们的目标是通过利用局部嘈杂的观察训练一项政策来规划无碰撞轨迹。然而,强化学习碰撞避免政策通常遭受高方差和再现性,因为与监督学习不同,RL没有与地面真实标签设置的固定训练。为了解决这些问题,我们介绍了一种基于RL的碰撞避免的两级训练方法。对于第一阶段,我们使用监督培训方法优化策略,其中损失功能鼓励代理遵循众所周知的互惠碰撞避免策略。对于第二阶段,我们使用政策渐变来改进策略。我们在各种模拟场景中验证了我们的政策,并且广泛的数值模拟表明我们的政策可以在不完美的感应下产生时间效率和无碰撞的路径,并且可以很好地处理具有未知噪声水平的嘈杂的本地观察。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号