首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

【24h】

LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

机译：LightSpMV：使用压缩的稀疏行更快的CUDA兼容稀疏矩阵矢量乘法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Compressed sparse row (CSR) is one of the most frequently used sparse matrix storage formats. However, the efficiency of existing CUDA-compatible CSR-based sparse matrix vector multiplication (SpMV) implementations is relatively low. We address this issue by presenting LightSpMV, a parallelized CSR-based SpMV implementation programmed in CUDA C++. This algorithm achieves high speed by employing atomic and warp shuffle instructions to implement fine-grained dynamic distribution of matrix rows over vectors/warps as well as efficient vector dot product computation. Moreover, we propose a unified cache hit rate computation approach to consistently investigate the caching behavior for different SpMV kernels, which may have different data deployment in the hierarchical memory space of CUDA-enabled GPUs. We have assessed LightSpMV using a set of sparse matrices and further compared it to the CSR-based SpMV kernels in the top-performing CUSP, ViennaCL and cuSPARSE libraries. Our experimental results demonstrate that LightSpMV is superior to CUSP, ViennaCL and cuSPARSE on the same Kepler-based Tesla K40c GPU, running up to 2.63 x and 2.65 x faster than CUSP, up to 2.52 x and 1.96 x faster than ViennaCL, and up to 1.94 x and 1.79 x faster than cuSPARSE with respect to single and double precision, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts. LightSpMV is open-source and publicly available at http://lightspmv.sourceforge.net.

机译：压缩稀疏行（CSR）是最常用的稀疏矩阵存储格式之一。但是，现有的基于CUDA兼容CSR的稀疏矩阵矢量乘法（SpMV）实现的效率相对较低。我们通过展示LightSpMV（一种使用CUDA C ++编程的基于CSR的并行化SpMV实现）来解决此问题。该算法通过采用原子和扭曲混合指令来实现矢量/扭曲上矩阵行的细粒度动态分布以及有效的矢量点积计算，从而实现了高速。此外，我们提出了一种统一的缓存命中率计算方法，以一致地调查不同SpMV内核的缓存行为，这些SpMV内核在启用CUDA的GPU的分层内存空间中可能具有不同的数据部署。我们使用一组稀疏矩阵评估了LightSpMV，并将其与性能最高的CUSP，ViennaCL和cuSPARSE库中基于CSR的SpMV内核进行了比较。我们的实验结果表明，在同一基于开普勒的Tesla K40c GPU上，LightSpMV优于CUSP，ViennaCL和cuSPARSE，运行速度比CUSP快2.63倍和2.65倍，比ViennaCL快2.52倍和1.96倍，并且最高单精度和双精度分别比cuSPARSE快1.94倍和1.79倍。此外，为了加速PageRank图形应用程序，LightSpMV仍然保持了与上述三个同类产品一致的优势。 LightSpMV是开源的，可以在http://lightspmv.sourceforge.net上公开获得。

著录项

来源
《Journal of signal processing systems for signal, image, and video technology》 |2018年第1期|69-86|共18页
作者
Liu Yongchao; Schmidt Bertil;
展开▼
作者单位

Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA;

Johannes Gutenberg Univ Mainz, Inst Comp Sci, D-55128 Mainz, Germany;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparse matrix-vector multiplication; Compressed sparse row; CUDA; GPU;

机译：稀疏矩阵-向量乘法压缩稀疏行CUDA GPU;

相似文献

外文文献
中文文献
专利

1. Sparse matrix multiplication: The distributed block-compressed sparse row library [J] . Urban Borstnik, Joost VandeVondele, Valery Weber, Parallel Computing . 2014,第5a6期

机译：稀疏矩阵乘法：分布式块压缩稀疏行库
2. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
3. A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS [J] . Kreutzer Moritz, Hager Georg, Wellein Gerhard, SIAM Journal on Scientific Computing . 2014,第5期

机译：在具有宽模拟单元的现代处理器上有效地通用稀疏矩阵-向量乘法的统一稀疏矩阵数据格式
4. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs [C] . Yongchao Liu, Schmidt Bertil IEEE International Conference on Application-Specific Systems, Architectures and Processors . 2015

机译：LightSpMV：启用CUDA的GPU上基于CSR的更快的稀疏矩阵矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. A Fast Sparse Recovery Algorithm for Compressed Sensing Using Approximate l0 Norm and Modified Newton Method [O] . Dingfei Jin, Yue Yang, Tao Ge, 2019

机译：近似10范数和改进牛顿法的压缩感知快速稀疏恢复算法
7. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks [O] . Aydın Buluç, Jeremy T. Fineman, Matteo Frigo, 2009

机译：使用压缩稀疏块的并行稀疏矩阵向量和矩阵转置向量乘法

获取原文

客服邮箱：kefu@zhangqiaokeyan.com

京公网安备：11010802029741号 ICP备案号：京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有

客服微信
服务号