首页> 外文会议>IEEE International Conference on Electronics, Electrical Engineering and Computing >Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer
【24h】

Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

机译:使用CSR格式,固定内存和重叠数据传输的GPU上有效的稀疏矩阵矢量乘法

获取原文

摘要

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have tried to optimize the performance of SpMV using storage formats other than CSR (Compressed Storage Row), experienced extra time in the conversion between formats. we propose to optimize the performance of SpMV by reducing the latency of copying data between host and device, so we present CSR-Async, a new program that takes into account CSR-Vector for the kernel code in GPU and uses pinned memory for host vectors and makes asynchronous copies form host to device and vice verse making use of non-default streams and overlap data transfer. CSR-Async has better performance than CSR-Vector and CSR-Scalar, since it is 2.26 and 1.73 times faster respectively.
机译:稀疏矩阵矢量乘法(SpMV)的性能对于计算科学家而言很重要。但是,由于不规则的内存访问模式,负载不平衡和并行性降低,图形处理单元(GPU)上的SpMV性能较差。另一方面,尝试使用除CSR(压缩存储行)以外的存储格式来优化SpMV性能的研究人员,在格式之间进行转换时要花费额外的时间。我们建议通过减少主机和设备之间复制数据的延迟来优化SpMV的性能,因此我们提出了CSR-Async,这是一个新程序,该程序将CSR-Vector用于GPU中的内核代码,并将固定内存用于主机矢量并使用非默认流和重叠的数据传输将异步副本从主机托管到设备,反之亦然。 CSR-Async比CSR-Vector和CSR-Scalar具有更好的性能,因为它分别快了2.26和1.73倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号