...
首页> 外文期刊>Computational intelligence and neuroscience >Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models
【24h】

Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models

机译:通过深度模型对空中捕获的视频序列进行实时人体检测

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Human detection in videos plays an important role in various real life applications. Most of traditional approaches depend on utilizing handcrafted features which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN), pretrained CNN feature extractor, and hierarchical extreme learning machine) for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that the proposed methods are successful for human detection task. Pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with soft-max and 91.7% with Support Vector Machines (SVM). H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU), H-ELM’s training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high performance Graphical Processing Unit (GPU).
机译:视频中的人体检测在各种现实应用中都扮演着重要角色。大多数传统方法都依赖于利用手工制作的功能,这些功能依赖于问题并且对于特定任务而言是最佳的。此外,它们极易受到动态事件的影响,例如照明变化,相机抖动和物体尺寸变化。另一方面,提出的特征学习方法更便宜,更容易,因为无需专家知识即可自动生成高度抽象和具有区别性的特征。在本文中,我们利用结合了光流和三种不同深度模型(即监督卷积神经网络(S-CNN),预训练的CNN特征提取器和分层极端学习机)的自动特征学习方法,对使用以下方法捕获的视频进行人为检测高度可变的空中平台上的非静态相机。这些模型在公开可用且极富挑战性的UCF-ARG航空数据集上进行了培训和测试。分析了这些模型在训练,测试准确性和学习速度方面的比较。绩效评估考虑了五种人类行为(挖掘,挥舞,投掷,行走和奔跑)。实验结果表明,所提出的方法是成功的人体检测任务。预训练的CNN的平均准确性为98.09%。 S-CNN使用soft-max产生的平均准确度为95.6%,使用支持向量机(SVM)产生的平均准确度为91.7%。 H-ELM的平均准确度为95.9%。使用普通的中央处理器(CPU),H-ELM的训练时间为445秒。使用高性能图形处理单元(GPU),在S-CNN中学习需要770秒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号