首页> 外文会议>AIAA SciTech forum and exposition >Deep Reinforcement Learning on Intelligent Motion Video Guidance for Unmanned Air System Ground Target Tracking
【24h】

Deep Reinforcement Learning on Intelligent Motion Video Guidance for Unmanned Air System Ground Target Tracking

机译:智能运动视频制导的深度强化学习,用于无人机系统地面目标跟踪

获取原文

摘要

Tracking motion of ground targets based on aerial images can benefit commercial, civilian, and military applications. On small fixed-wing unmanned air systems that carry strapdown instead of gimbaled cameras, it is a challenging problem since the aircraft must maneuver to keep the ground targets in the image frame of the camera. Previous approaches for strapdown cameras achieved satisfactory tracking performance using standard reinforcement learning algorithms. However, these algorithms assumed constant airspeed and constant altitude because the number of states and actions was restricted. This paper presents an approach to solve the ground target tracking problem by proposing the Policy Gradient Deep Reinforcement Learning controller. The learning is based on the continuous full-state aircraft states and uses multiple states and actions. Compared to previous approaches, the major advantage of this controller is the ability to handle the full-state ground target tracking case. Policies are trained for three different target cases: static, constant linear motion, and random motion. Results presented in the paper on a simulated environment show that the trained Policy Gradient Deep Reinforcement Learning controller is able to consistently keep a randomly maneuvering target in the camera image frame. Learning algorithm sensitivity to hyperparameters selection is investigated in the paper, since this can drastically impact the tracking performance.
机译:基于空中图像跟踪地面目标的运动可以使商业,民用和军事应用受益。在携带固定式而不是万向架摄像机的小型固定翼无人空中系统上,这是一个具有挑战性的问题,因为飞机必须进行机动以将地面目标保持在摄像机的图像框中。使用标准强化学习算法,用于捷联相机的先前方法获得了令人满意的跟踪性能。但是,由于状态和动作的数量受到限制,因此这些算法假定空速恒定且高度恒定。本文提出了一种通过提出“策略梯度深度强化学习”控制器来解决地面目标跟踪问题的方法。该学习基于连续的全状态飞机状态,并使用多个状态和动作。与以前的方法相比,该控制器的主要优点是能够处理全状态地面目标跟踪情况。针对三种不同的目标案例对策略进行了训练:静态,恒定线性运动和随机运动。本文在模拟环境中呈现的结果表明,训练有素的Policy Gradient深度强化学习控制器能够始终将随机操纵的目标保持在相机图像帧中。本文研究了学习算法对超参数选择的敏感性,因为这会严重影响跟踪性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号