A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

Weifeng Liu; Brian Vinter

首页> 外文期刊>Journal of Parallel and Distributed Computing >A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

【24h】

A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

机译：GPU和异构处理器上的通用稀疏矩阵矩阵乘法的框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input matrices. In this work we propose a framework for SpGEMM on GPUs and emerging CPU-GPU heterogeneous processors. This framework particularly focuses on the above three problems. Memory pre-allocation for the resulting matrix is organized by a hybrid method that saves a large amount of global memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary arithmetic operations on the nonzero entries and is guaranteed in all stages. Compared with the state-of-the-art CPU and GPU SpGEMM methods, our approach delivers excellent absolute performance and relative speedups on various benchmarks multiplying matrices with diverse sparsity structures. Furthermore, on heterogeneous processors, our SpGEMM approach achieves higher throughput by using re-allocatable shared virtual memory.

机译：通用稀疏矩阵矩阵乘法（SpGEMM）是许多应用程序（例如代数多重网格方法（AMG），广度优先搜索和最短路径问题）的基本构建块。与其他稀疏BLAS例程相比，有效的并行SpGEMM实现必须从三个方面处理额外的不规则性：（1）预先知道所得稀疏矩阵中非零条目的数量，（2）在随机位置非常昂贵的并行插入操作在生成的稀疏矩阵中，执行时间占主导地位，并且（3）负载平衡必须考虑两个输入矩阵中的稀疏数据。在这项工作中，我们为GPU和新兴的CPU-GPU异构处理器上的SpGEMM提出了一个框架。该框架特别关注上述三个问题。最终矩阵的存储器预分配是通过一种混合方法来组织的，该方法可以节省大量的全局存储器空间并有效利用非常有限的片上暂存器存储器。非零条目的并行插入操作是通过GPU合并路径算法实现的，该算法在实验上被认为是最快的GPU合并方法。负载平衡建立在对非零条目的必要算术运算的数量上，并在所有阶段得到保证。与最新的CPU和GPU SpGEMM方法相比，我们的方法可在各种基准上提供出色的绝对性能和相对加速，从而将矩阵与各种稀疏结构相乘。此外，在异构处理器上，我们的SpGEMM方法通过使用可重新分配的共享虚拟内存来实现更高的吞吐量。

著录项

来源
《Journal of Parallel and Distributed Computing》 |2015年第11期|47-61|共15页
作者
Weifeng Liu; Brian Vinter;
展开▼
作者单位

Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17,2700 Copenhagen, Denmark;

Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17,2700 Copenhagen, Denmark;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparse matrix; Sparse matrix-matrix multiplication; Linear algebra; GPU; Heterogeneous processor; Merging; Parallel algorithm;

机译：稀疏矩阵;稀疏矩阵矩阵乘法;线性代数GPU;异构处理器;合并;并行算法;

相似文献

外文文献
中文文献
专利

1. Register-based Implementation of the Sparse General Matrix-Matrix Multiplication on GPUs [J] . Junhong Liu, Xin He, Weifeng Liu, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第1期

机译：基于寄存器的GPU稀疏常规矩阵矩阵乘法的实现
2. Multithreaded sparse matrix-matrix multiplication for many-core and GPU architectures [J] . Deveci Mehmet, Trott Christian, Rajamanickam Sivasankaran Parallel Computing . 2018,第octa期

机译：适用于多核和GPU架构的多线程稀疏矩阵矩阵乘法
3. Optimizing Sparse Matrix-Matrix Multiplication for the GPU [J] . Dalton Steven, Olson Luke, Bell Nathan ACM transactions on mathematical software . 2015,第4期

机译：为GPU优化稀疏矩阵-矩阵乘法
4. Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format [C] . Shaohuai Shi, Qiang Wang, Xiaowen Chu IEEE International Conference on Parallel and Distributed Systems . 2020

机译：使用定制稀疏存储格式的高效稀疏密集矩阵矩阵乘法
5. Optimizing Tall-and-skinny Matrix-matrix Multiplication on GPUs [D] . Xiong, Nan 2018

机译：在GPU上优化高而瘦的矩阵矩阵乘法
6. A unified framework for sparse non-negative least squares using multiplicative updates and the non-negative matrix factorization problem [O] . Igor Fedorov, Alican Nalci, Ritwik Giri, -1

机译：使用乘法更新和非负矩阵分解问题的稀疏非负最小二乘的统一框架
7. A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors [O] . Liu, Weifeng, Vinter, Brian 2015

机译：GpU和GpU上的一般稀疏矩阵 - 矩阵乘法框架异构处理器

A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅