Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format

机译：使用CSR存储格式的GPU上的高效稀疏矩阵矢量乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. Compressed sparse row (CSR) is the most frequently used format to store sparse matrices. However, CSR-based SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. This has led researchers to propose new storage formats. Unfortunately, dynamically transforming CSR into these formats has significant runtime and storage overheads. We propose a novel algorithm, CSR-Adaptive, which keeps the CSR format intact and maps well to GPUs. Our implementation addresses the aforementioned challenges by (i) efficiently accessing DRAM by streaming data into the local scratchpad memory and (ii) dynamically assigning different numbers of rows to each parallel GPU compute unit. CSR-Adaptive achieves an average speedup of 14.7× over existing CSR-based algorithms and 2.3× over clSpMV cocktail, which uses an assortment of matrix formats.

机译：稀疏矩阵矢量乘法（SpMV）的性能对于计算科学家而言很重要。压缩稀疏行（CSR）是存储稀疏矩阵的最常用格式。但是，由于不规则的内存访问模式，负载不平衡和并行性降低，图形处理单元（GPU）上基于CSR的SpMV的性能较差。这导致研究人员提出了新的存储格式。不幸的是，将CSR动态转换为这些格式会产生大量的运行时和存储开销。我们提出了一种新颖的CSR自适应算法，该算法可保持CSR格式完整并映射到GPU。我们的实现通过以下方式解决了上述挑战：（i）通过将数据流式传输到本地暂存器内存来有效访问DRAM，以及（ii）为每个并行GPU计算单元动态分配不同数量的行。与现有的基于CSR的算法相比，CSR-Adaptive的平均速度提高了14.7倍，而使用多种矩阵格式的clSpMV鸡尾酒的平均速度提高了2.3倍。

著录项

来源
《International Conference for High Performance Computing, Networking, Storage and Analysis》|2014年|769-780|共12页
会议地点
作者
Greathouse Joseph L.; Daga Mayank;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
graphics processing units; mathematics computing; matrix multiplication; parallel processing; sparse matrices; CSR storage format; CSR-adaptive; CSR-based SpMV; DRAM; clSpMV cocktail; compressed sparse row; graphics processing units; local scratchpad memory; parallel GPU compute unit; sparse matrix-vector multiplication; streaming data; Bandwidth; Graphics processing units; Heuristic algorithms; Instruction sets; Random access memory; Sparse matrices; Vectors; AMD; Sparse matrix-vector multiplication (SpMV); compressed sparse row (CSR); general purpose computation on graphics processing units (GPGPU); performance acceleration;

机译：图形处理单元;数学计算;矩阵乘法;并行处理;稀疏矩阵; CSR存储格式;自适应CSR;基于CSR的SpMV; DRAM; clSpMV鸡尾酒;压缩的稀疏行;图形处理单元;本地暂存器内存;并行GPU计算单元;稀疏矩阵向量乘法;流数据;带宽;图形处理单元;启发式算法;指令集;随机存取存储器;稀疏矩阵;向量; AMD;稀疏矩阵向量乘法（SpMV）;压缩稀疏行（CSR）;一般图形处理器（GPGPU）上的目标计算;性能提升;

相似文献

外文文献
中文文献
专利

1. Merge-based Sparse Matrix-Vector Multiplication (SpMV) using the CSR Storage Format [J] . Merrill Duane, Garland Michael ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2016,第8期

机译：使用CSR存储格式的基于合并的稀疏矩阵矢量乘法（SpMV）
2. Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第PTa11期

机译：GPU上基于CSR的高效矩阵向量乘法
3. A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第pta4期

机译：GPU上基于CSR的新型稀疏矩阵矢量乘法
4. Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer [C] . Herwin Alayn Huillcen Baca, Flor de Luz Palomino Valdivia IEEE International Conference on Electronics, Electrical Engineering and Computing . 2019

机译：使用CSR格式，固定内存和重叠数据传输的GPU上有效的稀疏矩阵矢量乘法
5. Developing a New Storage Format and a Warp-Based SpMV Kernel for Configuration Interaction Sparse Matrices on the GPU [D] . Mahmoud, Mohammed. 2018

机译：为GPU上的配置交互稀疏矩阵开发新的存储格式和基于Warp的SpMV内核
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format [O] . Duane Merrill, Michael Garland 2016

机译：使用CSR存储格式合并基于稀疏矩阵 - 矢量乘法（SPMV）

Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format

摘要

著录项

相似文献

相关主题

期刊订阅