A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection

Li Shijie; Dou Yong; Niu Xin; Lv Qi; Wang Qiang

首页> 外文期刊>Neurocomputing >A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection

【24h】

A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection

机译：卷积神经网络的一种快速且节省内存的GPU加速算法，用于目标检测

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Target detection is a hard real-time task for video and image processing. This task has recently been accomplished through the feedforward process of convolutional neural networks (CNN), which is usually accelerated by general-purpose graphic units (GPUs). However, there are two challenges for this task. One is that the running speed remains to be improved. The other is that we probably use a deeper and larger CNN model, but a more sophisticated model may not be trained well due to the shortage of GPU memory. In this paper, we present two scheduling algorithms to solve the aforementioned challenges for improving the system performance holistically. The first one is an efficient image combination algorithm used to accelerate the feedforward process of CNN. The other is a light-memory-cost algorithm used to train an arbitrarily large CNN model for a GPU device with a limited memory. We run our experiments on a GTX980 card and use a CNN model with 8 GB of model parameters, which is larger than the size of the global memory of a GPU. Compared with that of cuDNNv3, a high speedup of 6.97x is obtained in the detection task.

机译：目标检测是视频和图像处理的一项艰巨的实时任务。最近，通过卷积神经网络（CNN）的前馈过程完成了此任务，而卷积神经网络通常由通用图形单元（GPU）加速。但是，此任务有两个挑战。一是运行速度有待提高。另一个是我们可能会使用更深，更大的CNN模型，但是由于GPU内存不足，可能无法很好地训练更复杂的模型。在本文中，我们提出了两种调度算法来解决上述从整体上提高系统性能的挑战。第一个是用于加速CNN前馈过程的有效图像组合算法。另一种是轻型内存成本算法，用于为内存有限的GPU设备训练任意大的CNN模型。我们在GTX980卡上运行实验，并使用具有8 GB模型参数的CNN模型，该参数大于GPU全局内存的大小。与cuDNNv3相比，在检测任务中获得了6.97倍的高加速比。

著录项

来源
《Neurocomputing》 |2017年第22期|48-59|共12页
作者
Li Shijie; Dou Yong; Niu Xin; Lv Qi; Wang Qiang;
展开▼
作者单位

Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

Natl Univ Def Technol, Natl Lab Parallel & Distributed Proc, Changsha, Hunan, Peoples R China|Natl Univ Def Technol, Sch Comp, Changsha, Hunan, Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Convolutional neural networks; GPU; Target detection;

机译：卷积神经网络;GPU;目标检测;

相似文献

外文文献
中文文献
专利

1. Fast 2D Convolution Algorithms for Convolutional Neural Networks [J] . Cheng Chao, Parhi Keshab K. Circuits and Systems I: Regular Papers, IEEE Transactions on . 2020,第5期

机译：卷积神经网络的快速2D卷积算法
2. Towards a fast and accurate road object detection algorithm based on convolutional neural networks [J] . Zhang Qinghui, Wan Chenxia, Han Weiliang, Journal of electronic imaging . 2018,第PTa2期

机译：一种基于卷积神经网络的快速准确的道路目标检测算法
3. Deep solar radiation forecasting with convolutional neural network and long short-term memory network algorithms [J] . Ghimire Sujan, Deo Ravinesh C., Raj Nawin, Applied Energy . 2019,第NOVa1期

机译：卷积神经网络和长短期记忆网络算法的太阳深辐射预测
4. Optimized GPU Acceleration Algorithm of Convolutional Neural Networks for Target Detection [C] . Shijie Li, Yong Dou, Qi Lv, IEEE International Conference on High Performance Computing and Communications;IEEE International Conference on Smart City;IEEE International Conference on Data Science and Systems . 2016

机译：卷积神经网络优化目标检测的GPU加速算法
5. Convolutional Neural Network Acceleration on GPU by Exploiting Data Reuse. [D] . Gopalakrishnan Elango, Sindhuja. 2017

机译：通过利用数据重用在GPU上进行卷积神经网络加速。
6. Broiler stunned state detection based on an improved fast region-based convolutional neural network algorithm [O] . Chang-wen Ye, Khurram Yousaf, Chao Qi, 2020

机译：基于改进的基于快速区域的卷积神经网络算法的肉鸡震惊状态检测
7. Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs [O] . Li, Chao, Yang, Yi, Feng, Min, 2016

机译：优化深度卷积神经网络的存储效率图形处理器

A fast and memory saved GPU acceleration algorithm of convolutional neural networks for target detection

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅