Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

Pietron Marcin; Zurek Dominik; Sniezynski Bartlomiej

首页> 外文期刊>Journal of computational science >Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

【24h】

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

机译：Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

This work is focused on the UnSparse-Opt framework for the efficient unstructured pruning and quantisation of feedforward neural networks and the improvement of their efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementation of deep learning (DL) algorithms for GPUs. One of the most common techniques for improving the efficiency of Convolutional Neural Network (CNN) models is weight pruning and quantisation. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with structural it is difficult to achieve a sparsity level and accuracy as high as that obtained with the non-structural version. Non-structural pruning with retraining can generate the weight tensors up to similar to 90% or more of sparsity in some deep CNN models. In this article, the pruning algorithm is presented which achieve high sparsity levels without drop in accuracy. In the next stage, the linear and non-linear quantisation is adapted for further reductions in time and memory footprint. Additionally, this work presents real CNN models pruned with high sparsities in which some subset of layers can have comparable or better efficiency than cuDnn by using a direct sparse method. Finally, it shows sparse CNN-based architectures with reduced precision which can be more efficient than CuDnn library.

著录项

来源
《Journal of computational science》 |2023年第3期|1.1-1.10|共10页
作者
Pietron Marcin; Zurek Dominik; Sniezynski Bartlomiej;
展开▼
作者单位

AGH Univ Sci & Technol;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
CNN; Autoencoders; CuDnn; Pruning; GPGPU computing;

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

摘要

著录项

相关主题

期刊订阅