An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

机译：在图形处理单元上实现稀疏矩阵矢量乘法的有效方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed Jagged Diagonal storage format which is tailored for GPUs, BTJAD. We develop a highly optimized SpMV kernel that takes advantage of the properties of the BTJAD storage format and reuses loaded values of the source vector in the registers of a GPU. Using 62 matrices with different sparsity patterns and executing on an NVIDIA Tesla T10 GPU, we compare the performance of our kernel with that of the SpMV kernels in NVIDIA's library. Our kernel achieves superior execution throughputs for matrices that are non-uniform in their nonzero row lengths, outperforming the best available kernels by up to 4.67x. When executing on the Fermi class GeForce GTX480 GPU which has a larger register file size, the maximum speedup achieved by our kernel improves to 6.6x.

机译：稀疏矩阵向量乘法，SPMV，通常是迭代求解器中的性能瓶颈。最近，图形处理单元GPU已经部署以增强该操作的性能。我们介绍了一个封闭的锯齿状对角线存储格式的版本，该存储格式为GPU，BTJAD量身定制。我们开发了一种高度优化的SPMV内核，它利用了BTJAD存储格式的属性，并在GPU的寄存器中重用源向量的加载值。使用具有不同稀疏模式的62个矩阵和在NVIDIA TESLA T10 GPU上执行，我们将内核的性能与NVIDIA的库中的SPMV内核进行比较。我们的内核在非统计行长度中实现了矩阵的卓越的执行吞吐量，优先于最佳可用内核高达4.67倍。在具有较大寄存器文件大小的Fermi类GeForce GTX480 GPU上执行时，我们的内核实现的最大加速度将改善为6.6倍。

著录项

来源
《IEEE International Conference on High Performance Computing and Communication》|2012年||共8页
会议地点
作者
Abu-Sufah Walid; Karim Asma Abdel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems [J] . Bertil Schmidt, Hans Aribowo, Hoang-Vu Dang Concurrency and Computation . 2013,第4期

机译：迭代稀疏矩阵矢量乘法，用于在多图形处理单元系统上通过GF（2）加速块Wiedemann算法
2. A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS [J] . Kreutzer Moritz, Hager Georg, Wellein Gerhard, SIAM Journal on Scientific Computing . 2014,第5期

机译：在具有宽模拟单元的现代处理器上有效地通用稀疏矩阵-向量乘法的统一稀疏矩阵数据格式
3. Effective Implementation of Matrix-Vector Multiplication on Intel's AVX multicore Processor [J] . Hassan Somaia A., Mahmoud Mountasser M. M., Hemeida A. M., Computer Languages, Systems & Structures . 2018,第JANa期

机译：英特尔AVX多核处理器上矩阵矢量乘法的有效实现
4. An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units [C] . Abu-Sufah Walid, Karim Asma Abdel The 14th IEEE International Conference on High Performance Computing and Communication ; The 9th IEEE International Conference on Embedded Software and Systems. . 2012

机译：在图形处理单元上实现稀疏矩阵向量乘法的有效方法
5. Sparsity-centric Optimization for Neural Networks on Modern Graphics Processing Units: Algorithmic and Architectural Perspective [D] . Zhu, Maohua . 2020

机译：现代图形处理单元上的神经网络中心优化：算法和建筑视角
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs) [O] . Sarah AlAhmadi, Thaha Mohammed, Aiiad Albeshri, 2020

机译：稀疏矩阵矢量乘法（SPMV）对图形处理单元（GPU）的性能分析

An Effective Approach for Implementing Sparse Matrix-Vector Multiplication on Graphics Processing Units

摘要

著录项

相似文献

相关主题

期刊订阅