首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format
【24h】

Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format

机译:使用CSR存储格式的GPU上的高效稀疏矩阵矢量乘法

获取原文

摘要

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. This has led researchers to propose new storage formats. Unfortunately, dynamically transforming CSR into these formats has significant runtime and storage overheads. We propose a novel algorithm, CSR-Adaptive, which keeps the CSR format intact and maps well to GPUs. Our implementation addresses the aforementioned challenges by (i) efficiently accessing DRAM by streaming data into the local scratchpad memory and (ii) dynamically assigning different numbers of rows to each parallel GPU compute unit. CSR-Adaptive achieves an average speedup of 14.7× over existing CSR-based algorithms and 2.3× over clSpMV cocktail, which uses an assortment of matrix formats.
机译:稀疏矩阵矢量乘法(SpMV)的性能对于计算科学家而言很重要。压缩稀疏行(CSR)是存储稀疏矩阵的最常用格式。但是,由于不规则的内存访问模式,负载不平衡和并行性降低,图形处理单元(GPU)上基于CSR的SpMV的性能较差。这导致研究人员提出了新的存储格式。不幸的是,将CSR动态转换为这些格式会产生大量的运行时和存储开销。我们提出了一种新颖的CSR自适应算法,该算法可保持CSR格式完整并映射到GPU。我们的实现通过以下方式解决了上述挑战:(i)通过将数据流式传输到本地暂存器内存来有效访问DRAM,以及(ii)为每个并行GPU计算单元动态分配不同数量的行。与现有的基于CSR的算法相比,CSR-Adaptive的平均速度提高了14.7倍,而使用多种矩阵格式的clSpMV鸡尾酒的平均速度提高了2.3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号