首页> 外文期刊>Journal of computational science >Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction
【24h】

Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

机译:Speedup deep learning models on GPU by taking advantage of efficient unstructured pruning and bit-width reduction

获取原文
获取原文并翻译 | 示例
           

摘要

This work is focused on the UnSparse-Opt framework for the efficient unstructured pruning and quantisation of feedforward neural networks and the improvement of their efficiency on graphic processing units (GPU) by using a direct sparse algorithm. The Nvidia deep neural network (cuDnn) library is the most effective implementation of deep learning (DL) algorithms for GPUs. One of the most common techniques for improving the efficiency of Convolutional Neural Network (CNN) models is weight pruning and quantisation. There are two main types of pruning: structural and non-structural. The first enables much easier acceleration on many type of accelerators, but with structural it is difficult to achieve a sparsity level and accuracy as high as that obtained with the non-structural version. Non-structural pruning with retraining can generate the weight tensors up to similar to 90% or more of sparsity in some deep CNN models. In this article, the pruning algorithm is presented which achieve high sparsity levels without drop in accuracy. In the next stage, the linear and non-linear quantisation is adapted for further reductions in time and memory footprint. Additionally, this work presents real CNN models pruned with high sparsities in which some subset of layers can have comparable or better efficiency than cuDnn by using a direct sparse method. Finally, it shows sparse CNN-based architectures with reduced precision which can be more efficient than CuDnn library.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号