cuConv: CUDA implementation of convolution for CNN inference

Jorda Marc; Valero-Lara Pedro; Pena Antonio J.

首页> 外文期刊>Cluster computing >cuConv: CUDA implementation of convolution for CNN inference

【24h】

cuConv: CUDA implementation of convolution for CNN inference

机译：cuConv: CUDA implementation of convolution for CNN inference

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and are largely used in production. State-of-the-art implementations, however, present low efficiency for some commonly used network configurations. In this paper we propose a GPU-based implementation of the convolution operation for CNN inference that favors coalesced accesses, without requiring prior data transformations. Our experiments demonstrate that it yields notable performance improvements in a range of common CNN forward-propagation convolution configurations, with speedups of up to 2.29 x with respect to the best implementation in cuDNN, covering a relevant region in currently existing approaches. This improvement results in speedups of up to 7.4 for CNN online inference use cases.

著录项

来源
《Cluster computing》 |2022年第2期|1459-1473|共15页
作者
Jorda Marc; Valero-Lara Pedro; Pena Antonio J.;
展开▼
作者单位

Barcelona Supercomp Ctr BSC;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
Coalescing; Convolutional neural networks; cuDNN; Deep learning; GPU convolution; PERFORMANCE;
入库时间 2024-01-25 19:28:29

cuConv: CUDA implementation of convolution for CNN inference

摘要

著录项

相关主题

期刊订阅