首页> 外文期刊>Journal of Parallel and Distributed Computing >A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors
【24h】

A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors

机译:GPU和异构处理器上的通用稀疏矩阵矩阵乘法的框架

获取原文
获取原文并翻译 | 示例

摘要

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input matrices. In this work we propose a framework for SpGEMM on GPUs and emerging CPU-GPU heterogeneous processors. This framework particularly focuses on the above three problems. Memory pre-allocation for the resulting matrix is organized by a hybrid method that saves a large amount of global memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary arithmetic operations on the nonzero entries and is guaranteed in all stages. Compared with the state-of-the-art CPU and GPU SpGEMM methods, our approach delivers excellent absolute performance and relative speedups on various benchmarks multiplying matrices with diverse sparsity structures. Furthermore, on heterogeneous processors, our SpGEMM approach achieves higher throughput by using re-allocatable shared virtual memory.
机译:通用稀疏矩阵矩阵乘法(SpGEMM)是许多应用程序(例如代数多重网格方法(AMG),广度优先搜索和最短路径问题)的基本构建块。与其他稀疏BLAS例程相比,有效的并行SpGEMM实现必须从三个方面处理额外的不规则性:(1)预先知道所得稀疏矩阵中非零条目的数量,(2)在随机位置非常昂贵的并行插入操作在生成的稀疏矩阵中,执行时间占主导地位,并且(3)负载平衡必须考虑两个输入矩阵中的稀疏数据。在这项工作中,我们为GPU和新兴的CPU-GPU异构处理器上的SpGEMM提出了一个框架。该框架特别关注上述三个问题。最终矩阵的存储器预分配是通过一种混合方法来组织的,该方法可以节省大量的全局存储器空间并有效利用非常有限的片上暂存器存储器。非零条目的并行插入操作是通过GPU合并路径算法实现的,该算法在实验上被认为是最快的GPU合并方法。负载平衡建立在对非零条目的必要算术运算的数量上,并在所有阶段得到保证。与最新的CPU和GPU SpGEMM方法相比,我们的方法可在各种基准上提供出色的绝对性能和相对加速,从而将矩阵与各种稀疏结构相乘。此外,在异构处理器上,我们的SpGEMM方法通过使用可重新分配的共享虚拟内存来实现更高的吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号