首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows
【24h】

LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows

机译:LightSpMV:使用压缩的稀疏行更快的CUDA兼容稀疏矩阵矢量乘法

获取原文
获取原文并翻译 | 示例

摘要

Compressed sparse row (CSR) is one of the most frequently used sparse matrix storage formats. However, the efficiency of existing CUDA-compatible CSR-based sparse matrix vector multiplication (SpMV) implementations is relatively low. We address this issue by presenting LightSpMV, a parallelized CSR-based SpMV implementation programmed in CUDA C++. This algorithm achieves high speed by employing atomic and warp shuffle instructions to implement fine-grained dynamic distribution of matrix rows over vectors/warps as well as efficient vector dot product computation. Moreover, we propose a unified cache hit rate computation approach to consistently investigate the caching behavior for different SpMV kernels, which may have different data deployment in the hierarchical memory space of CUDA-enabled GPUs. We have assessed LightSpMV using a set of sparse matrices and further compared it to the CSR-based SpMV kernels in the top-performing CUSP, ViennaCL and cuSPARSE libraries. Our experimental results demonstrate that LightSpMV is superior to CUSP, ViennaCL and cuSPARSE on the same Kepler-based Tesla K40c GPU, running up to 2.63 x and 2.65 x faster than CUSP, up to 2.52 x and 1.96 x faster than ViennaCL, and up to 1.94 x and 1.79 x faster than cuSPARSE with respect to single and double precision, respectively. In addition, for the acceleration of the PageRank graph application, LightSpMV still keeps consistent superiority to the aforementioned three counterparts. LightSpMV is open-source and publicly available at http://lightspmv.sourceforge.net.
机译:压缩稀疏行(CSR)是最常用的稀疏矩阵存储格式之一。但是,现有的基于CUDA兼容CSR的稀疏矩阵矢量乘法(SpMV)实现的效率相对较低。我们通过展示LightSpMV(一种使用CUDA C ++编程的基于CSR的并行化SpMV实现)来解决此问题。该算法通过采用原子和扭曲混合指令来实现矢量/扭曲上矩阵行的细粒度动态分布以及有效的矢量点积计算,从而实现了高速。此外,我们提出了一种统一的缓存命中率计算方法,以一致地调查不同SpMV内核的缓存行为,这些SpMV内核在启用CUDA的GPU的分层内存空间中可能具有不同的数据部署。我们使用一组稀疏矩阵评估了LightSpMV,并将其与性能最高的CUSP,ViennaCL和cuSPARSE库中基于CSR的SpMV内核进行了比较。我们的实验结果表明,在同一基于开普勒的Tesla K40c GPU上,LightSpMV优于CUSP,ViennaCL和cuSPARSE,运行速度比CUSP快2.63倍和2.65倍,比ViennaCL快2.52倍和1.96倍,并且最高单精度和双精度分别比cuSPARSE快1.94倍和1.79倍。此外,为了加速PageRank图形应用程序,LightSpMV仍然保持了与上述三个同类产品一致的优势。 LightSpMV是开源的,可以在http://lightspmv.sourceforge.net上公开获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号