Optimizing SpMV for Diagonal Sparse Matrices on GPU

机译：在GPU上为对角稀疏矩阵优化SpMV

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse Matrix-Vector multiplication (SpMV) is an important computational kernel in scientific applications. Its performance highly depends on the nonzero distribution of sparse matrices. In this paper, we propose a new storage format for diagonal sparse matrices, defined as Compressed Row Segment with Diagonal-pattern (CRSD). In CRSD, we design diagonal patterns to represent the diagonal distribution. As the Graphics Processing Units (GPUs) have tremendous computation power and OpenCL makes them more suitable for the scientific computing, we implement the SpMV for CRSD format on the GPUs using OpenCL. Since the OpenCL kernels are complied at runtime, we design the code generator to produce the codelets for all diagonal patterns after storing matrices into CRSD format. Specifically, the generated codelets already contain the index information of nonzeros, which reduces the memory pressure during the SpMV operation. Furthermore, the code generator also utilizes property of memory architecture and thread schedule on the GPUs to improve the performance. In the evaluation, we select four storage formats from prior state-of-the-art implementations (Bell and Garland, 2009) on GPU. Experimental results demonstrate that the speedups reach up to 1.52 and 1.94 in comparison with the optimal implementation of the four formats for the double and single precision respectively. We also evaluate on a two-socket quad-core Intel Xeon system. The speedups reach up to 11.93 and 12.79 in comparison with CSR format under 8 threads for the double and single precision respectively.

机译：稀疏矩阵向量乘法（SpMV）是科学应用中的重要计算内核。它的性能高度取决于稀疏矩阵的非零分布。在本文中，我们为对角稀疏矩阵提出了一种新的存储格式，定义为具有对角线模式的压缩行段（CRSD）。在CRSD中，我们设计对角线图案来表示对角线分布。由于图形处理单元（GPU）具有强大的计算能力，而OpenCL使其更适合于科学计算，因此我们使用OpenCL在GPU上实现SpMV for CRSD格式。由于OpenCL内核是在运行时编译的，因此我们将代码生成器设计为在将矩阵存储为CRSD格式后为所有对角线模式生成小码。具体地，所生成的小码已经包含非零的索引信息，这减少了SpMV操作期间的存储压力。此外，代码生成器还利用GPU上的内存体系结构和线程计划的属性来提高性能。在评估中，我们从GPU上的现有最新实现中选择了四种存储格式（Bell和Garland，2009年）。实验结果表明，与分别针对双精度和单精度的四种格式的最佳实现方式相比，加速比分别达到了1.52和1.94。我们还评估了两路四核Intel Xeon系统。与8位线程下的CSR格式相比，双精度和单精度的加速比分别达到11.93和12.79。

著录项

来源
《2011 International Conference on Parallel Processing》|2011年|p.492-501|共10页
会议地点
作者
Sun Xiangzheng; Zhang Yunquan; Wang Ting; Zhang Xianyi; Yuan Liang; Rao Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133.2;
关键词

相似文献

外文文献
中文文献
专利

1. SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPU [J] . Al-Mouhamed Mayez A., Khan Ayaz H. Journal of supercomputing . 2017,第9期

机译：SpMV和BiCG-Stab优化针对GPU上的一类七对角稀疏矩阵
2. Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms [J] . Benatia Akrem, Ji Weixing, Wang Yizhuo, Experimental Mechanics . 2020,第1期

机译：稀疏矩阵划分，用于在CPU-GPU异构平台上优化SpMV
3. BestSF A Sparse Meta-Format for Optimizing SpMV on GPU [J] . Benatia Akrem, Ji Weixing, Wang Yizhuo, ACM Transactions on Architecture and Code Optimization . 2018,第3期

机译：最好的稀疏元格式，可以在GPU上优化SPMV
4. Optimizing SpMV for Diagonal Sparse Matrices on GPU [C] . Sun Xiangzheng, Zhang Yunquan, Wang Ting, International Conference on Parallel Processing . 2011

机译：优化GPU上的对角线稀疏矩阵的SPMV
5. Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU [D] . Mahmoud, Mohammed. 2018

机译：为GPU上的配置交互稀疏矩阵开发新的存储格式和基于Warp的SpMV内核
6. Sparsity estimation from compressive projections via sparse random matrices [O] . Chiara Ravazzi, Sophie Fosson, Tiziano Bianchi, -1

机译：通过稀疏随机矩阵从压缩投影进行稀疏估计
7. Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms [O] . Akrem Benatia, Weixing Ji, Yizhuo Wang, 2019

机译：用于优化CPU-GPU异构平台SPMV的稀疏矩阵分区

Optimizing SpMV for Diagonal Sparse Matrices on GPU

摘要

著录项

相似文献

相关主题

期刊订阅