首页> 外文期刊>Neurocomputing >A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection
【24h】

A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection

机译:卷积神经网络的一种快速且节省内存的GPU加速算法,用于目标检测

获取原文
获取原文并翻译 | 示例
       

摘要

Target detection is a hard real-time task for video and image processing. This task has recently been accomplished through the feedforward process of convolutional neural networks (CNN), which is usually accelerated by general-purpose graphic units (GPUs). However, there are two challenges for this task. One is that the running speed remains to be improved. The other is that we probably use a deeper and larger CNN model, but a more sophisticated model may not be trained well due to the shortage of GPU memory. In this paper, we present two scheduling algorithms to solve the aforementioned challenges for improving the system performance holistically. The first one is an efficient image combination algorithm used to accelerate the feedforward process of CNN. The other is a light-memory-cost algorithm used to train an arbitrarily large CNN model for a GPU device with a limited memory. We run our experiments on a GTX980 card and use a CNN model with 8 GB of model parameters, which is larger than the size of the global memory of a GPU. Compared with that of cuDNNv3, a high speedup of 6.97x is obtained in the detection task.
机译:目标检测是视频和图像处理的一项艰巨的实时任务。最近,通过卷积神经网络(CNN)的前馈过程完成了此任务,而卷积神经网络通常由通用图形单元(GPU)加速。但是,此任务有两个挑战。一是运行速度有待提高。另一个是我们可能会使用更深,更大的CNN模型,但是由于GPU内存不足,可能无法很好地训练更复杂的模型。在本文中,我们提出了两种调度算法来解决上述从整体上提高系统性能的挑战。第一个是用于加速CNN前馈过程的有效图像组合算法。另一种是轻型内存成本算法,用于为内存有限的GPU设备训练任意大的CNN模型。我们在GTX980卡上运行实验,并使用具有8 GB模型参数的CNN模型,该参数大于GPU全局内存的大小。与cuDNNv3相比,在检测任务中获得了6.97倍的高加速比。

著录项

  • 来源
    《Neurocomputing》 |2017年第22期|48-59|共12页
  • 作者单位

    Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

    Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Convolutional neural networks; GPU; Target detection;

    机译:卷积神经网络;GPU;目标检测;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号