Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

机译：使用CSR格式，固定内存和重叠数据传输的GPU上有效的稀疏矩阵矢量乘法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have tried to optimize the performance of SpMV using storage formats other than CSR (Compressed Storage Row), experienced extra time in the conversion between formats. we propose to optimize the performance of SpMV by reducing the latency of copying data between host and device, so we present CSR-Async, a new program that takes into account CSR-Vector for the kernel code in GPU and uses pinned memory for host vectors and makes asynchronous copies form host to device and vice verse making use of non-default streams and overlap data transfer. CSR-Async has better performance than CSR-Vector and CSR-Scalar, since it is 2.26 and 1.73 times faster respectively.

机译：稀疏矩阵矢量乘法（SPMV）的性能对计算科学家很重要。然而，图形处理单元（GPU）上的SPMV由于不规则的存储器访问模式，负载不平衡和降低的并行性而具有差的性能。另一方面，尝试使用除CSR（压缩存储行）以外的存储格式优化SPMV性能的研究人员在格式之间的转换中经历了额外的时间。我们建议通过减少主机和设备之间的复制数据的延迟来优化SPMV的性能，因此我们呈现CSR-async，这是一个新的程序，该程序考虑了GPU中的内核代码的CSR-向量，并为主机向量使用固定内存并使异步副本表单主机到设备和使用非默认流和重叠数据传输的副副本。 CSR-Async具有比CSR-Vector和CSR标量更好的性能，因为它分别为2.26和1.73倍。

著录项

来源
《IEEE International Conference on Electronics, Electrical Engineering and Computing》|2019年|283 p. :|共4页
会议地点
作者
Herwin Alayn Huillcen Baca; Flor de Luz Palomino Valdivia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类安全保密;
关键词
Sparse matrices; Graphics processing units; Data transfer; Performance evaluation; Kernel; Mathematical model; Arrays;

机译：稀疏矩阵;图形处理单位;数据传输;绩效评估;核;数学模型;阵列;

相似文献

外文文献
中文文献
专利

1. Efficient CSR-Based Sparse Matrix-Vector Multiplication on GPU [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第PTa11期

机译：GPU上基于CSR的高效矩阵向量乘法
2. A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS [J] . Kreutzer Moritz, Hager Georg, Wellein Gerhard, SIAM Journal on Scientific Computing . 2014,第5期

机译：在具有宽模拟单元的现代处理器上有效地通用稀疏矩阵-向量乘法的统一稀疏矩阵数据格式
3. A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第pta4期

机译：GPU上基于CSR的新型稀疏矩阵矢量乘法
4. Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer [C] . Herwin Alayn Huillcen Baca, Flor de Luz Palomino Valdivia IEEE International Conference on Electronics, Electrical Engineering and Computing . 2019

机译：使用CSR格式，固定内存和重叠数据传输的GPU上有效的稀疏矩阵矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. Fast and efficient fully 3D PET image reconstruction using sparse system matrix factorization with GPU acceleration [O] . Jian Zhou, Jinyi Qi -1

机译：使用具有GpU加速稀疏系统矩阵分解快速高效的全3D pET图像重建
7. A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs [O] . Guixia He, Jiaquan Gao 2016

机译：GPU上基于CSR的稀疏矩阵矢量乘法

Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅