Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication

Liu Junhong; He Xin; Liu Weifeng; Tan Guangming

首页> 外文期刊>International journal of parallel programming >Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication

【24h】

Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication

机译：并行稀疏矩阵-矩阵乘法的寄存器感知优化

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block of a number of high-level algorithms and real-world applications. In recent years, several efficient SpGEMM algorithms have been proposed for many-core processors such as GPUs. However, their implementations of sparse accumulators, the core component of SpGEMM, mostly use low speed on-chip shared memory and global memory, and high speed registers are seriously underutilised. In this paper, we propose three novel register-aware SpGEMM algorithms for three representative sparse accumulators, i.e., sort, merge and hash, respectively. We fully utilise the GPU registers to fetch data, finish computations and store results out. In the experiments, our algorithms deliver excellent performance on a benchmark suite including 205 sparse matrices from the SuiteSparse Matrix Collection. Specifically, on an Nvidia Pascal P100 GPU, our three register-aware sparse accumulators achieve on average 2.0x (up to 5.4x), 2.6x (up to 10.5x) and 1.7x (up to 5.2x) speedups over their original implementations in libraries bhSPARSE, RMerge and NSPARSE, respectively.

机译：通用稀疏矩阵矩阵乘法（SpGEMM）是许多高级算法和实际应用程序的基本构建块。近年来，已针对多种核心处理器（例如GPU）提出了几种有效的SpGEMM算法。但是，稀疏累加器（SpGEMM的核心组件）的实现大多使用低速片上共享内存和全局内存，而高速寄存器的使用率严重不足。在本文中，我们针对三种代表性的稀疏累加器（即分别为sort，merge和hash）提出了三种新颖的可感知寄存器的SpGEMM算法。我们充分利用GPU寄存器来获取数据，完成计算并存储结果。在实验中，我们的算法可在基准套件上提供出色的性能，其中包括SuiteSparse Matrix Collection中的205个稀疏矩阵。具体来说，在Nvidia Pascal P100 GPU上，我们的三个可识别寄存器的稀疏累加器与原始实现相比平均可实现2.0倍（最高5.4倍），2.6倍（最高10.5倍）和1.7倍（最高5.2倍）的加速。分别在库bhSPARSE，RMerge和NSPARSE中。

著录项

来源
《International journal of parallel programming》 |2019年第3期|403-417|共15页
作者
Liu Junhong; He Xin; Liu Weifeng; Tan Guangming;
展开▼
作者单位

Univ Chinese Acad Sci State Key Lab Comp Architecture Inst Comp Technol Chinese Acad Sci Beijing Peoples R China;

Norwegian Univ Sci & Technol Dept Comp Sci Trondheim Norway;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparse matrix; Sparse matrix-matrix multiplication; GPU; Register;

机译：稀疏矩阵;稀疏矩阵矩阵乘法;GPU;寄存器;

相似文献

外文文献
中文文献
专利

1. Reducing inter-process communication overhead in parallel sparse matrix-matrix multiplication [J] . Ahmed Md Salman, Houser Jennifer, Hoque Mohammad Asadul, International journal of grid and high performance computing . 2017,第3期

机译：减少并行稀疏矩阵-矩阵乘法的进程间通信开销
2. Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model [J] . Rubensson Emanuel H., Rudberg Elias Parallel Computing . 2016,第SEPa期

机译：使用块和任务编程模型的局部性并行块稀疏矩阵矩阵乘法
3. EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIX-MATRIX MULTIPLICATION [J] . Azad Ariful, Ballard Grey, Buluc Aydin, SIAM Journal on Scientific Computing . 2016,第6期

机译：在稀疏矩阵-矩阵乘法中探索并行性的多个层次
4. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication [C] . Penporn Koanantakool, Ariful Azad, Aydin Buluç, IEEE International Parallel and Distributed Processing Symposium . 2016

机译：避免通信的并行稀疏-密集矩阵-矩阵乘法
5. Efficient, scalable, parallel, matrix-matrix multiplication [D] . Portillo, Enrique 2013

机译：高效，可扩展，并行，矩阵矩阵乘法
6. Golden-Angle Radial Sparse Parallel MRI: Combination of Compressed Sensing Parallel Imaging and Golden-Angle Radial Sampling for Fast and Flexible Dynamic Volumetric MRI [O] . Li Feng, Robert Grimm, Kai Tobias Block, -1

机译：金角径向稀疏并行MRI：压缩传感并行成像和金角径向采样相结合可实现快速灵活的动态容积MRI
7. Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication [O] . Kadir Akbudak, Oguz Selvitopi, Cevdet Aykanat 2018

机译：用于缩放并行稀疏矩阵矩阵乘法的分区模型

Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅