An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

机译：在图形处理单元上实现稀疏矩阵向量乘法的有效方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed Jagged Diagonal storage format which is tailored for GPUs, BTJAD. We develop a highly optimized SpMV kernel that takes advantage of the properties of the BTJAD storage format and reuses loaded values of the source vector in the registers of a GPU. Using 62 matrices with different sparsity patterns and executing on an NVIDIA Tesla T10 GPU, we compare the performance of our kernel with that of the SpMV kernels in NVIDIA's library. Our kernel achieves superior execution throughputs for matrices that are non-uniform in their nonzero row lengths, outperforming the best available kernels by up to 4.67x. When executing on the Fermi class GeForce GTX480 GPU which has a larger register file size, the maximum speedup achieved by our kernel improves to 6.6x.

机译：稀疏矩阵向量乘法SpMV通常是迭代求解器中的性能瓶颈。最近，已经部署了图形处理单元GPU，以增强此操作的性能。我们介绍了专为GPU（BTJAD）量身定做的“交错锯齿对角线”存储格式的阻止版本。我们开发了高度优化的SpMV内核，该内核利用BTJAD存储格式的特性，并在GPU的寄存器中重用了源向量的加载值。我们使用62种具有不同稀疏模式的矩阵并在NVIDIA Tesla T10 GPU上执行，我们将内核的性能与NVIDIA库中的SpMV内核的性能进行了比较。对于非零行长不一致的矩阵，我们的内核可实现卓越的执行吞吐量，其性能比最佳可用内核高出4.67倍。在具有更大寄存器文件大小的Fermi类GeForce GTX480 GPU上执行时，我们内核实现的最大加速提高到6.6倍。

著录项

来源
《The 14th IEEE International Conference on High Performance Computing and Communication ; The 9th IEEE International Conference on Embedded Software and Systems.》|2012年|p.453- 460|共8页
会议地点 Liverpool(GB);Liverpool(GB);Liverpool(GB);Liverpool(GB)
作者
Abu-Sufah Walid; Karim Asma Abdel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;计算技术、计算机技术;通信网;
关键词

相似文献

外文文献
中文文献
专利

1. Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems [J] . Bertil Schmidt, Hans Aribowo, Hoang-Vu Dang Concurrency and Computation . 2013,第4期

机译：迭代稀疏矩阵矢量乘法，用于在多图形处理单元系统上通过GF（2）加速块Wiedemann算法
2. A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS [J] . Kreutzer Moritz, Hager Georg, Wellein Gerhard, SIAM Journal on Scientific Computing . 2014,第5期

机译：在具有宽模拟单元的现代处理器上有效地通用稀疏矩阵-向量乘法的统一稀疏矩阵数据格式
3. Effective Implementation of Matrix-Vector Multiplication on Intel's AVX multicore Processor [J] . Hassan Somaia A., Mahmoud Mountasser M. M., Hemeida A. M., Computer Languages, Systems & Structures . 2018,第JANa期

机译：英特尔AVX多核处理器上矩阵矢量乘法的有效实现
4. An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units [C] . Abu-Sufah Walid, Karim Asma Abdel IEEE International Conference on High Performance Computing and Communication . 2012

机译：在图形处理单元上实现稀疏矩阵矢量乘法的有效方法
5. Sparsity-centric Optimization for Neural Networks on Modern Graphics Processing Units: Algorithmic and Architectural Perspective [D] . Zhu, Maohua . 2020

机译：现代图形处理单元上的神经网络中心优化：算法和建筑视角
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs) [O] . Sarah AlAhmadi, Thaha Mohammed, Aiiad Albeshri, 2020

机译：稀疏矩阵矢量乘法（SPMV）对图形处理单元（GPU）的性能分析

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

摘要

著录项

相似文献

相关主题

期刊订阅