Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU

机译：在GPU上使用规则化多块稀疏模式的高效稀疏神经网络

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A large portion of the computation in sparse neural networks comprises of multiplying a sparse matrix with a dense matrix, denoted SDMM in this paper. The SDMM operation with an unstructured sparsity pattern cannot be efficiently processed on modern architectures such as GPUs due to irregularity in compute and memory accesses. However, efficient parallel algorithms on a GPU can be designed for SDMM when the sparsity pattern is more structured. Thus, the run time performance of sparse neural networks on a GPU is dependent on the sparsity pattern present in the underlying matrix. In sparse neural networks obtained using pruning based approaches, the choice of sparsity pattern not only effects the run time, but also effects the accuracy of the task for which the neural network is trained for. Sparsity patterns which have a good run time performance on a GPU may not have a good accuracy and vice-versa. The real challenge then is to given a target architecture, identify a sparsity pattern, a storage format, and an algorithm for SDMM that leads to sparse neural networks which are efficient in both run time and accuracy. In this work, we propose a novel, structured, flexible, and generic sparsity pattern called the RMB (Regularized Multi Block) sparsity pattern, and an efficient storage format (CRMB), and a fast GPU algorithm for processing RMBMM (SDMM with the multiplicand having RMB sparsity pattern). Using the RMB sparsity pattern, we achieve better trade-offs between the accuracy, and the run time performance of sparse neural networks on a GPU when compared to commonly used sparsity patterns like unstructured, and block sparsity patterns.

机译：稀疏神经网络中的大部分计算包括将稀疏矩阵与稠密矩阵相乘，在本文中称为SDMM。由于计算和内存访问的不规则性，具有非结构化稀疏模式的SDMM操作无法在现代架构（例如GPU）上有效处理。但是，当稀疏模式更加结构化时，可以为SDMM设计GPU上的高效并行算法。因此，稀疏神经网络在GPU上的运行时性能取决于底层矩阵中存在的稀疏模式。在使用基于修剪的方法获得的稀疏神经网络中，稀疏模式的选择不仅会影响运行时间，还会影响针对其进行训练的任务的准确性。在GPU上具有良好运行时性能的稀疏模式可能没有很高的准确性，反之亦然。然后，真正的挑战是给定目标体系结构，识别稀疏模式，存储格式和SDMM算法，从而导致稀疏神经网络在运行时间和准确性上均有效。在这项工作中，我们提出了一种新颖，结构化，灵活且通用的稀疏模式，称为RMB（Regularized Multi Block，稀疏模式），并提供了一种高效的存储格式（CRMB）和一种用于处理RMMMMM（SDMM与被乘数）的快速GPU算法。有人民币稀疏模式）。与非结构化和块稀疏性模式等常用稀疏性模式相比，使用RMB稀疏性模式，我们可以在GPU上的稀疏神经网络的准确性和运行时性能之间取得更好的权衡。

著录项

来源
《International Conference on High Performance Computing, Data, and Analytics》|2019年|215-224|共10页
会议地点 Hyderabad(IN)
作者
Dharma Teja Vooturi; Kishore Kothapalli;
展开▼
作者单位

International Institute of Information Technology Hyderabad;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
computational complexity; graphics processing units; neural nets; parallel algorithms; parallel processing; sparse matrices;

机译：计算复杂度；图形处理单元；神经网络并行算法；并行处理;稀疏矩阵;

相似文献

外文文献
中文文献
专利

1. An efficient manifold regularized sparse non-negative matrix factorization model for large-scale recommender systems on GPUs [J] . Li Hao, Li Keqin, An Jiyao, Information Sciences: An International Journal . 2019,第期

机译：GPU上大型推荐系统的高效歧管正则稀疏非负矩阵分解模型
2. Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format [J] . Martone Michele Parallel Computing . 2014,第7期

机译：递归稀疏块格式的高效多线程未转置，转置或对称稀疏矩阵矢量乘法
3. SNAP: An Efficient Sparse Neural Acceleration Processor for Unstructured Sparse Deep Neural Network Inference [J] . Zhang Jie-Fang, Lee Ching-En, Liu Chester, IEEE Journal of Solid-State Circuits . 2021,第2期

机译：SNAP：一个有效的稀疏神经加速处理器，用于非结构化稀疏深神经网络推理
4. Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU [C] . Dharma Teja Vooturi, Kishore Kothapalli International Conference on High Performance Computing, Data, and Analytics . 2019

机译：高效稀疏神经网络在GPU上使用正规化的多块稀疏模式
5. GPUBLQMR: GPU-Accelerated Sparse Block Quasi-Minimum Residual Linear Solver [D] . Lacouture, Rubens. 2021

机译：GPublQMR：GPU加速稀疏块准余量剩余线性求解器
6. Discovering mutated driver genes through a robust and sparse co-regularized matrix factorization framework with prior information from mRNA expression patterns and interaction network [O] . Jianing Xi, Minghui Wang, Ao Li 2018

机译：通过健壮且稀疏的共规则矩阵分解框架发现突变的驱动基因并获得来自mRNA表达模式和相互作用网络的先验信息
7. GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks [O] . Guyue Huang, Guohao Dai, Yu Wang, 2020

机译：GE-SPMM：图形神经网络GPU上的通用稀疏矩阵乘法

Efficient Sparse Neural Networks Using Regularized Multi Block Sparsity Pattern on a GPU

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅