首页> 外文期刊>International journal of parallel programming >Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication
【24h】

Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication

机译:并行稀疏矩阵-矩阵乘法的寄存器感知优化

获取原文
获取原文并翻译 | 示例

摘要

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block of a number of high-level algorithms and real-world applications. In recent years, several efficient SpGEMM algorithms have been proposed for many-core processors such as GPUs. However, their implementations of sparse accumulators, the core component of SpGEMM, mostly use low speed on-chip shared memory and global memory, and high speed registers are seriously underutilised. In this paper, we propose three novel register-aware SpGEMM algorithms for three representative sparse accumulators, i.e., sort, merge and hash, respectively. We fully utilise the GPU registers to fetch data, finish computations and store results out. In the experiments, our algorithms deliver excellent performance on a benchmark suite including 205 sparse matrices from the SuiteSparse Matrix Collection. Specifically, on an Nvidia Pascal P100 GPU, our three register-aware sparse accumulators achieve on average 2.0x (up to 5.4x), 2.6x (up to 10.5x) and 1.7x (up to 5.2x) speedups over their original implementations in libraries bhSPARSE, RMerge and NSPARSE, respectively.
机译:通用稀疏矩阵矩阵乘法(SpGEMM)是许多高级算法和实际应用程序的基本构建块。近年来,已针对多种核心处理器(例如GPU)提出了几种有效的SpGEMM算法。但是,稀疏累加器(SpGEMM的核心组件)的实现大多使用低速片上共享内存和全局内存,而高速寄存器的使用率严重不足。在本文中,我们针对三种代表性的稀疏累加器(即分别为sort,merge和hash)提出了三种新颖的可感知寄存器的SpGEMM算法。我们充分利用GPU寄存器来获取数据,完成计算并存储结果。在实验中,我们的算法可在基准套件上提供出色的性能,其中包括SuiteSparse Matrix Collection中的205个稀疏矩阵。具体来说,在Nvidia Pascal P100 GPU上,我们的三个可识别寄存器的稀疏累加器与原始实现相比平均可实现2.0倍(最高5.4倍),2.6倍(最高10.5倍)和1.7倍(最高5.2倍)的加速。分别在库bhSPARSE,RMerge和NSPARSE中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号