首页> 外文会议>IEEE International Conference on Application-Specific Systems, Architectures and Processors >LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs
【24h】

LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs

机译:LightSpMV:启用CUDA的GPU上基于CSR的更快的稀疏矩阵矢量乘法

获取原文

摘要

Compressed sparse row (CSR) is a frequently used format for sparse matrix storage. However, the state-of-the-art CSR-based sparse matrix-vector multiplication (SpMV) implementations on CUDA-enabled GPUs do not exhibit very high efficiency. This has motivated the development of some alternative storage formats for GPU computing. Unfortunately, these alternatives are incompatible with most CPU-centric programs and require dynamic conversion from CSR at runtime, thus incurring significant computational and storage overheads. We present LightSpMV, a novel CUDA-compatible SpMV algorithm using the standard CSR format, which achieves high speed by benefiting from the fine-grained dynamic distribution of matrix rows over warps/vectors. In LightSpMV, two dynamic row distribution approaches have been investigated at the vector and warp levels with atomic operations and warp shuffle functions as the fundamental building blocks. We have evaluated LightSpMV using various sparse matrices and further compared it to the CSR-based SpMV subprograms in the state-of-the-art CUSP and cuSPARSE libraries. Performance evaluation reveals that on the same Tesla K40c GPU, LightSpMV is superior to both CUSP and cuSPARSE, with a speedup of up to 2.60 and 2.63 over CUSP, and up to 1.93 and 1.79 over cuSPARSE for single and double precision, respectively. LightSpMV is available at http://lightspmv.sourceforge.net.
机译:压缩稀疏行(CSR)是稀疏矩阵存储的一种常用格式。但是,在支持CUDA的GPU上,基于CSR的最新稀疏矩阵矢量乘法(SpMV)实现并没有表现出很高的效率。这激发了一些用于GPU计算的替代存储格式的开发。不幸的是,这些替代方案与大多数以CPU为中心的程序不兼容,并且需要在运行时从CSR动态转换,从而导致大量的计算和存储开销。我们提出了LightSpMV,这是一种使用标准CSR格式的新颖CUDA兼容SpMV算法,该算法通过受益于扭曲/矢量上矩阵行的细粒度动态分布来实现高速。在LightSpMV中,已经在矢量和经纱级别研究了两种动态行分布方法,其中原子操作和经纱混洗功能是基本的构建块。我们已经使用各种稀疏矩阵评估了LightSpMV,并将其与最新的CUSP和cuSPARSE库中基于CSR的SpMV子程序进行了比较。性能评估表明,在同一个Tesla K40c GPU上,LightSpMV优于CUSP和cuSPARSE,与CUSP相比,速度分别提高了2.60和2.63,在cuSPARSE上,单精度和双精度分别提高了1.93和1.79。可从http://lightspmv.sourceforge.net获得LightSpMV。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号