A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing

Wang Dawei; Fan Tingxiang; Han Tao; Pan Jia

首页> 外文期刊>IEEE Robotics and Automation Letters >A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing

【24h】

A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing

机译：不完美感应下多UAV碰撞避免的两级加固学习方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unlike autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs) have a higher dimensional configuration space, which makes the motion planning of multi-UAVs a challenging task. In addition, uncertainties and noises are more significant in UAV scenarios, which increases the difficulty of autonomous navigation for multi-UAV. In this letter, we proposed a two-stage reinforcement learning (RL) based multi-UAV collision avoidance approach without explicitly modeling the uncertainty and noise in the environment. Our goal is to train a policy to plan a collision-free trajectory by leveraging local noisy observations. However, the reinforcement learned collision avoidance policies usually suffer from high variance and low reproducibility, because unlike supervised learning, RL does not have a fixed training set with ground-truth labels. To address these issues, we introduced a two-stage training method for RL based collision avoidance. For the first stage, we optimize the policy using a supervised training method with a loss function that encourages the agent to follow the well-known reciprocal collision avoidance strategy. For the second stage, we use policy gradient to refine the policy. We validate our policy in a variety of simulated scenarios, and the extensive numerical simulations demonstrate that our policy can generate time-efficient and collision-free paths under imperfect sensing, and can well handle noisy local observations with unknown noise levels.

机译：与自主地面车辆（AGVS）不同，无人驾驶飞行器（无人机）具有更高的维度配置空间，这使得多无人机的运动规划成为一个具有挑战性的任务。此外，在无人机方案中，不确定性和噪音更为显着，这增加了多UAV自主导航的难度。在这封信中，我们提出了一种基于两阶段的加强学习（RL）的多UAV碰撞避免方法，而无需明确地建模环境中的不确定性和噪声。我们的目标是通过利用局部嘈杂的观察训练一项政策来规划无碰撞轨迹。然而，强化学习碰撞避免政策通常遭受高方差和再现性，因为与监督学习不同，RL没有与地面真实标签设置的固定训练。为了解决这些问题，我们介绍了一种基于RL的碰撞避免的两级训练方法。对于第一阶段，我们使用监督培训方法优化策略，其中损失功能鼓励代理遵循众所周知的互惠碰撞避免策略。对于第二阶段，我们使用政策渐变来改进策略。我们在各种模拟场景中验证了我们的政策，并且广泛的数值模拟表明我们的政策可以在不完美的感应下产生时间效率和无碰撞的路径，并且可以很好地处理具有未知噪声水平的嘈杂的本地观察。

著录项

来源
《IEEE Robotics and Automation Letters》 |2020年第2期|3098-3105|共8页
作者
Wang Dawei; Fan Tingxiang; Han Tao; Pan Jia;
展开▼
作者单位

Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China;

Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China;

City Univ Hong Kong Dept Biomed Engn Hong Kong Peoples R China;

Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Collision avoidance; deep learning in robotics and automation;

机译：碰撞避免;机器人和自动化深度学习;

相似文献

外文文献
中文文献
专利

1. A composite learning method for multi-ship collision avoidance based on reinforcement learning and inverse control [J] . Xie Shuo, Chu Xiumin, Zheng Mao, Neurocomputing . 2020,第Octa21期

机译：基于钢筋学习和逆控制的多船碰撞避免复合学习方法
2. Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning [J] . Shady A. Maged, Bishoy H. Mikhail International journal of computational vision and robotics . 2020,第3期

机译：使用政策梯度优化和Q-Learning避免深增强学习碰撞
3. Learning from Demonstrations and Human Evaluative Feedbacks: Handling Sparsity and Imperfection Using Inverse Reinforcement Learning Approach [J] . Nafee Mourad, Ali Ezzeddine, Babak Nadjar Araabi, Journal of robotics . 2020,第Pta1期

机译：从演示和人类评估反馈中学习：使用反增强学习方法处理稀疏性和缺陷
4. Path Planning of Redundant Robot Manipulator for Obstacle Avoidance Using Reinforcement Learning - Reduction of Searched Configuration Space Using SOM and Two-stage Learning [C] . 平岡賢治, 青柳誠司, Kenji Hiraoka, 日本ロボット学会学術講演会 . 2007

机译：使用加固学习避免避免障碍物机械手的路径规划 - 使用SOM和两阶段学习减少搜索的配置空间
5. Airborne Collision Detection and Avoidance for Small UAS Sense and Avoid Systems. [D] . Sahawneh, Laith Rasmi. 2016

机译：适用于小型UAS感知和回避系统的机载碰撞检测和回避。
6. Distributed Non-Communicating Multi-Robot Collision Avoidance via Map-Based Deep Reinforcement Learning [O] . Guangda Chen, Shunyi Yao, Jun Ma, 2020

机译：通过基于地图的深度增强学习分布式非传送多机器人碰撞避免
7. Linking Individual Learning Styles to Approach-Avoidance Motivational Traits and Computational Aspects of Reinforcement Learning. [O] . Kristoffer Carl Aberg, Kimberly C Doell, Sophie Schwartz 2016

机译：将个人学习风格与强化学习的避免动机特征和计算方面联系起来。

A Two-Stage Reinforcement Learning Approach for Multi-UAV Collision Avoidance Under Imperfect Sensing

摘要

著录项

相似文献

相关主题

期刊订阅