首页> 外文会议>International Conference on High Performance Computing, Data, and Analytics >Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU
【24h】

Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU

机译:在GPU上使用规则化多块稀疏模式的高效稀疏神经网络

获取原文

摘要

A large portion of the computation in sparse neural networks comprises of multiplying a sparse matrix with a dense matrix, denoted SDMM in this paper. The SDMM operation with an unstructured sparsity pattern cannot be efficiently processed on modern architectures such as GPUs due to irregularity in compute and memory accesses. However, efficient parallel algorithms on a GPU can be designed for SDMM when the sparsity pattern is more structured. Thus, the run time performance of sparse neural networks on a GPU is dependent on the sparsity pattern present in the underlying matrix. In sparse neural networks obtained using pruning based approaches, the choice of sparsity pattern not only effects the run time, but also effects the accuracy of the task for which the neural network is trained for. Sparsity patterns which have a good run time performance on a GPU may not have a good accuracy and vice-versa. The real challenge then is to given a target architecture, identify a sparsity pattern, a storage format, and an algorithm for SDMM that leads to sparse neural networks which are efficient in both run time and accuracy. In this work, we propose a novel, structured, flexible, and generic sparsity pattern called the RMB (Regularized Multi Block) sparsity pattern, and an efficient storage format (CRMB), and a fast GPU algorithm for processing RMBMM (SDMM with the multiplicand having RMB sparsity pattern). Using the RMB sparsity pattern, we achieve better trade-offs between the accuracy, and the run time performance of sparse neural networks on a GPU when compared to commonly used sparsity patterns like unstructured, and block sparsity patterns.
机译:稀疏神经网络中的大部分计算包括将稀疏矩阵与稠密矩阵相乘,在本文中称为SDMM。由于计算和内存访问的不规则性,具有非结构化稀疏模式的SDMM操作无法在现代架构(例如GPU)上有效处理。但是,当稀疏模式更加结构化时,可以为SDMM设计GPU上的高效并行算法。因此,稀疏神经网络在GPU上的运行时性能取决于底层矩阵中存在的稀疏模式。在使用基于修剪的方法获得的稀疏神经网络中,稀疏模式的选择不仅会影响运行时间,还会影响针对其进行训练的任务的准确性。在GPU上具有良好运行时性能的稀疏模式可能没有很高的准确性,反之亦然。然后,真正的挑战是给定目标体系结构,识别稀疏模式,存储格式和SDMM算法,从而导致稀疏神经网络在运行时间和准确性上均有效。在这项工作中,我们提出了一种新颖,结构化,灵活且通用的稀疏模式,称为RMB(Regularized Multi Block,稀疏模式),并提供了一种高效的存储格式(CRMB)和一种用于处理RMMMMM(SDMM与被乘数)的快速GPU算法。有人民币稀疏模式)。与非结构化和块稀疏性模式等常用稀疏性模式相比,使用RMB稀疏性模式,我们可以在GPU上的稀疏神经网络的准确性和运行时性能之间取得更好的权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号