Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

机译：使用CSR格式，固定内存和重叠数据传输的GPU上有效的稀疏矩阵矢量乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have tried to optimize the performance of SpMV using storage formats other than CSR (Compressed Storage Row), experienced extra time in the conversion between formats. we propose to optimize the performance of SpMV by reducing the latency of copying data between host and device, so we present CSR-Async, a new program that takes into account CSR-Vector for the kernel code in GPU and uses pinned memory for host vectors and makes asynchronous copies form host to device and vice verse making use of non-default streams and overlap data transfer. CSR-Async has better performance than CSR-Vector and CSR-Scalar, since it is 2.26 and 1.73 times faster respectively.

机译：稀疏矩阵矢量乘法（SpMV）的性能对于计算科学家而言很重要。但是，由于不规则的内存访问模式，负载不平衡和并行性降低，图形处理单元（GPU）上的SpMV性能较差。另一方面，尝试使用除CSR（压缩存储行）以外的存储格式来优化SpMV性能的研究人员，在格式之间进行转换时要花费额外的时间。我们建议通过减少主机和设备之间复制数据的延迟来优化SpMV的性能，因此我们提出了CSR-Async，这是一个新程序，该程序将CSR-Vector用于GPU中的内核代码，并将固定内存用于主机矢量并使用非默认流和重叠的数据传输将异步副本从主机托管到设备，反之亦然。 CSR-Async比CSR-Vector和CSR-Scalar具有更好的性能，因为它分别快了2.26和1.73倍。

著录项

来源
《IEEE International Conference on Electronics, Electrical Engineering and Computing》|2019年|1-4|共4页
会议地点
作者
Herwin Alayn Huillcen Baca; Flor de Luz Palomino Valdivia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Sparse matrices; Graphics processing units; Data transfer; Performance evaluation; Kernel; Mathematical model; Arrays;

机译：稀疏矩阵;图形处理单元;数据传输;性能评估;内核;数学模型;数组;

相似文献

外文文献
中文文献
专利

1. Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第PTa11期

机译：GPU上基于CSR的高效矩阵向量乘法
2. A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS [J] . Kreutzer Moritz, Hager Georg, Wellein Gerhard, SIAM Journal on Scientific Computing . 2014,第5期

机译：在具有宽模拟单元的现代处理器上有效地通用稀疏矩阵-向量乘法的统一稀疏矩阵数据格式
3. A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第pta4期

机译：GPU上基于CSR的新型稀疏矩阵矢量乘法
4. Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer [C] . Herwin Alayn Huillcen Baca, Flor de Luz Palomino Valdivia IEEE International Conference on Electronics, Electrical Engineering and Computing . 2019

机译：使用CSR格式，固定内存和重叠数据传输的GPU上有效的稀疏矩阵矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. Fast and efficient fully 3D PET image reconstruction using sparse system matrix factorization with GPU acceleration [O] . Jian Zhou, Jinyi Qi -1

机译：使用具有GpU加速稀疏系统矩阵分解快速高效的全3D pET图像重建
7. A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs [O] . Guixia He, Jiaquan Gao 2016

机译：GPU上基于CSR的稀疏矩阵矢量乘法

Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

摘要

著录项

相似文献

相关主题

期刊订阅