首页> 外文会议>IEEE International Conference on Electronics, Electrical Engineering and Computing >Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer
【24h】

Efficient Sparse Matrix-Vector Multiplication on GPUs using the CSR Format, Pinned Memory and Overlap Data Transfer

机译:使用CSR格式,固定内存和重叠数据传输的GPU上有效的稀疏矩阵矢量乘法

获取原文

摘要

The performance of sparse matrix vector multiplication (SpMV) is important to computational scientists. However, the SpMV on graphics processing units (GPUs) has poor performance due to irregular memory access patterns, load imbalance, and reduced parallelism. On the other hand, researchers who have tried to optimize the performance of SpMV using storage formats other than CSR (Compressed Storage Row), experienced extra time in the conversion between formats. we propose to optimize the performance of SpMV by reducing the latency of copying data between host and device, so we present CSR-Async, a new program that takes into account CSR-Vector for the kernel code in GPU and uses pinned memory for host vectors and makes asynchronous copies form host to device and vice verse making use of non-default streams and overlap data transfer. CSR-Async has better performance than CSR-Vector and CSR-Scalar, since it is 2.26 and 1.73 times faster respectively.
机译:稀疏矩阵矢量乘法(SPMV)的性能对计算科学家很重要。然而,图形处理单元(GPU)上的SPMV由于不规则的存储器访问模式,负载不平衡和降低的并行性而具有差的性能。另一方面,尝试使用除CSR(压缩存储行)以外的存储格式优化SPMV性能的研究人员在格式之间的转换中经历了额外的时间。我们建议通过减少主机和设备之间的复制数据的延迟来优化SPMV的性能,因此我们呈现CSR-async,这是一个新的程序,该程序考虑了GPU中的内核代码的CSR-向量,并为主机向量使用固定内存并使异步副本表单主机到设备和使用非默认流和重叠数据传输的副副本。 CSR-Async具有比CSR-Vector和CSR标量更好的性能,因为它分别为2.26和1.73倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号