CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations

Hoang-Vu Dang; Bertil Schmidt

首页> 外文期刊>Parallel Computing >CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations

【24h】

CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations

机译：使用原子运算在GPU上启用CUDA的稀疏矩阵向量乘法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Existing formats for Sparse Matrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU using atomic operations. We compare SCOO performance to existing formats of the NVIDIA Cusp library using large sparse matrices. Our results for single-precision floating-point matrices show that SCOO outperforms the COO and CSR format for all tested matrices and the HYB format for all tested unstructured matrices on a single GPU. Furthermore, our dual-GPU implementation achieves an efficiency of 94% on average. Due to the lower performance of existing CUDA-enabled GPUs for atomic operations on double-precision floating-point numbers the SCOO implementation for double-precision does not consistently outperform the other formats for every unstructured matrix. Overall, the average speedup of SCOO for the tested benchmark dataset is 3.33 (1.56) compared to CSR, 5.25 (2.42) compared to COO, 2.39 (1.37) compared to HYB for single (double) precision on a Tesla C2075. Furthermore, comparison to a Sandy-Bridge CPU shows that SCOO on a Fermi GPU outperforms the multithreaded CSR implementation of the Intel MKL Library on an i7-2700 K by a factor between 5.5 (2.3) and 18 (12.7) for single (double) precision.

机译：GPU上的稀疏矩阵向量乘法（SpMV）的现有格式优于其在多核CPU上的相应实现。在本文中，我们提出了一种称为切片COO（SCOO）的新格式，以及一种有效的CUDA实现，可使用原子操作在GPU上执行SpMV。我们使用大型稀疏矩阵将SCOO性能与NVIDIA Cusp库的现有格式进行比较。我们对单精度浮点矩阵的结果表明，在单个GPU上，SCOO优于所有测试矩阵的COO和CSR格式，以及所有测试非结构化矩阵的HYB格式。此外，我们的双GPU实施平均可实现94％的效率。由于现有的启用CUDA的GPU在双精度浮点数上进行原子运算的性能较低，因此对于每个非结构化矩阵，用于双精度的SCOO实现并不能始终胜过其他格式。总体而言，在Tesla C2075上，单精度（双精度）的测试基准数据集的SCOO平均提速为CSR的3.33（1.56），COO的5.25（2.42），HYB的2.39（1.37）。此外，与Sandy-Bridge CPU的比较表明，Fermi GPU上的SCOO优于i7-2700 K上Intel MKL库的多线程CSR实现，单（双）的系数为5.5（2.3）和18（12.7）之间。精确。

著录项

来源
《Parallel Computing》 |2013年第11期|737-748|共12页
作者
Hoang-Vu Dang; Bertil Schmidt;
展开▼
作者单位

Institut fuer Informatik, Johannes Gutenberg Universitaet, Staudingerweg 9, 55128 Mainz, Germany;

Institut fuer Informatik, Johannes Gutenberg Universitaet, Staudingerweg 9, 55128 Mainz, Germany;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU computing; CUDA; Sparse matrices; Sparse Matrix-Vector Multiplication; Scientific programming;

机译：GPU计算;CUDA;稀疏矩阵;稀疏矩阵向量乘法;科学编程;

相似文献

外文文献
中文文献
专利

1. The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs [J] . Hoang-Vu Dang, Bertil Schmidt Procedia Computer Science . 2012,第1期

机译：启用CUDA的GPU上稀疏矩阵向量乘法的切片COO格式
2. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
3. Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs [J] . Ruixing Wang, Tongxiang Gu, Ming Li Journal of Computer and Communications . 2017,第6期

机译：基于GPU稀疏矩阵矢量乘法统计的性能预测
4. LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs [C] . Yongchao Liu, Schmidt Bertil IEEE International Conference on Application-Specific Systems, Architectures and Processors . 2015

机译：LightSpMV：启用CUDA的GPU上基于CSR的更快的稀疏矩阵矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. Accelerating metagenomic read classification on CUDA-enabled GPUs [O] . Robin Kobus, Christian Hundt, André Müller, 2017

机译：在支持CUDA的GPU上加速宏基因组读取分类
7. The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs [O] . Dang Hoang-Vu, Schmidt Bertil 2012

机译：启用CUDA的GPU上稀疏矩阵向量乘法的切片COO格式

CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅