Optimizing SpMV for Diagonal Sparse Matrices on GPU

机译：优化GPU上的对角线稀疏矩阵的SPMV

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse Matrix-Vector multiplication (SpMV) is an important computational kernel in scientific applications. Its performance highly depends on the nonzero distribution of sparse matrices. In this paper, we propose a new storage format for diagonal sparse matrices, defined as Compressed Row Segment with Diagonal-pattern (CRSD). In CRSD, we design diagonal patterns to represent the diagonal distribution. As the Graphics Processing Units (GPUs) have tremendous computation power and OpenCL makes them more suitable for the scientific computing, we implement the SpMV for CRSD format on the GPUs using OpenCL. Since the OpenCL kernels are complied at runtime, we design the code generator to produce the codelets for all diagonal patterns after storing matrices into CRSD format. Specifically, the generated codelets already contain the index information of nonzeros, which reduces the memory pressure during the SpMV operation. Furthermore, the code generator also utilizes property of memory architecture and thread schedule on the GPUs to improve the performance. In the evaluation, we select four storage formats from prior state-of-the-art implementations (Bell and Garland, 2009) on GPU. Experimental results demonstrate that the speedups reach up to 1.52 and 1.94 in comparison with the optimal implementation of the four formats for the double and single precision respectively. We also evaluate on a two-socket quad-core Intel Xeon system. The speedups reach up to 11.93 and 12.79 in comparison with CSR format under 8 threads for the double and single precision respectively.

机译：稀疏矩阵 - 矢量乘法（SPMV）是科学应用中的重要计算内核。它的性能高度取决于稀疏矩阵的非零分布。在本文中，我们提出了一种用于对角线稀疏矩阵的新存储格式，定义为具有对角线图案（CRSD）的压缩行段。在CRSD中，我们设计对角线模式以表示对角线分布。由于图形处理单元（GPU）具有巨大的计算功率和OpenCL使它们更适合于科学计算，我们使用OpenCL在GPU上实现SPMV的CRSD格式。由于OpenCL内核在运行时符合运行时，我们设计代码生成器以在将矩阵存储到CRSD格式之后为所有对角线模式生成代码单元。具体地，所生成的Codelet已经包含非安利斯的索引信息，这在SPMV操作期间降低了存储器压力。此外，代码生成器还利用GPU上的内存架构和线程计划的属性来提高性能。在评估中，我们在GPU上从先前最先进的实现（Bell和Garland，2009）中选择四种存储格式。实验结果表明，与分别为双格式和单精度的四种格式的最佳实现相比，加速度高达1.52和1.94。我们还评估了一个双套接字四核英特尔Xeon系统。加速度高达11.93和12.79，相比之下，分别为双线和单个精度的8个线程。

著录项

来源
《International Conference on Parallel Processing》|2011年||共10页
会议地点
作者
Sun Xiangzheng; Zhang Yunquan; Wang Ting; Zhang Xianyi; Yuan Liang; Rao Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133.2-53;
关键词

相似文献

外文文献
中文文献
专利

1. SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPU [J] . Al-Mouhamed Mayez A., Khan Ayaz H. Journal of supercomputing . 2017,第9期

机译：SpMV和BiCG-Stab优化针对GPU上的一类七对角稀疏矩阵
2. Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms [J] . Benatia Akrem, Ji Weixing, Wang Yizhuo, Experimental Mechanics . 2020,第1期

机译：稀疏矩阵划分，用于在CPU-GPU异构平台上优化SpMV
3. BestSF A Sparse Meta-Format for Optimizing SpMV on GPU [J] . Benatia Akrem, Ji Weixing, Wang Yizhuo, ACM Transactions on Architecture and Code Optimization . 2018,第3期

机译：最好的稀疏元格式，可以在GPU上优化SPMV
4. Optimizing SpMV for Diagonal Sparse Matrices on GPU [C] . Sun Xiangzheng, Zhang Yunquan, Wang Ting, 2011 International Conference on Parallel Processing . 2011

机译：在GPU上为对角稀疏矩阵优化SpMV
5. Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU [D] . Mahmoud, Mohammed. 2018

机译：为GPU上的配置交互稀疏矩阵开发新的存储格式和基于Warp的SpMV内核
6. Sparsity estimation from compressive projections via sparse random matrices [O] . Chiara Ravazzi, Sophie Fosson, Tiziano Bianchi, -1

机译：通过稀疏随机矩阵从压缩投影进行稀疏估计
7. Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms [O] . Akrem Benatia, Weixing Ji, Yizhuo Wang, 2019

机译：用于优化CPU-GPU异构平台SPMV的稀疏矩阵分区

Optimizing SpMV for Diagonal Sparse Matrices on GPU

摘要

著录项

相似文献

相关主题

期刊订阅