首页> 外文期刊>Concurrency and computation: practice and experience >GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
【24h】

GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication

机译:GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法

获取原文
获取原文并翻译 | 示例
           

摘要

Many high performance computing applications require computing both sparse matrix-vector productrn(SMVP) and sparse matrix-transpose vector product (SMTVP) for better overall performance. Under such arncircumstance, it is critical to maintain a similarly high throughput for these two computing patterns with thernunderlying sparse matrix encoded in a single storage format. The compressed sparse block (CSB) formatrnproposed by Buluç et al. allows computing both problems on multi-core CPUs with nearly identical throughputs.rnOn the other hand, a direct porting of CSB to graphics processing units (GPUs), which have beenrnrecently recognized as a powerful general purpose computing platform, turns out to be inefficient. In thisrnwork, we propose a new data structure, designated as expanded CSB (eCSB), to minimize the throughputrngap between SMVP and SMTVP computations on GPUs, while at the same time enable a high computingrnthroughput. We also use a hybrid storage format to store elements in each block, which can be selectedrndynamically at runtime. Experimental results show that the proposed techniques implemented on a KeplerrnGPU delivers similar throughput on both SMVP and SMTVP and the throughput is up to 13 times fasterrnthan that of the CPU-based CSB implementation. In addition, our eCSB procedure outperforms the previousrnGPU results by up to 188% and 914% in computing SMVP and SMTVP, and we validate the effectivenessrnof eCSB by means of wall-clock time of bi-conjugate gradient algorithm; our eCSB is 25% faster thanrnCompressed Sparse Rows (CSR) and 6% faster than HYB, respectively.
机译:许多高性能计算应用程序需要同时计算稀疏矩阵矢量积(SMVP)和稀疏矩阵转置矢量积(SMTVP),以实现更好的整体性能。在这种情况下,至关重要的是,要用编码为单一存储格式的底层稀疏矩阵为这两个计算模式保持类似的高吞吐量。 Buluç等人提出的压缩稀疏块(CSB)格式。允许在具有几乎相同的吞吐量的多核CPU上计算这两个问题。另一方面,将CSB直接移植到图形处理单元(GPU)一直被认为是功能强大的通用计算平台,效率低下。在本文中,我们提出了一种新的数据结构,称为扩展CSB(eCSB),以最大程度地减少GPU上SMVP和SMTVP计算之间的吞吐率,同时实现较高的计算吞吐量。我们还使用混合存储格式将元素存储在每个块中,可以在运行时动态选择它们。实验结果表明,在KeplerrnGPU上实现的拟议技术在SMVP和SMTVP上均提供了相似的吞吐量,并且其吞吐量比基于CPU的CSB实现的速度快13倍。此外,在计算SMVP和SMTVP时,我们的eCSB程序要比先前的GPU结果高出188%和914%,并且我们通过双共轭梯度算法的挂钟时间验证了eCSB的有效性。我们的eCSB分别比压缩稀疏行(CSR)快25%和比HYB快6%。

著录项

  • 来源
  • 作者单位

    State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China,College of Mathematics, Jilin Normal University, Jilin, 136000, China;

    School of Software, Tsinghua University, Beijing, 100084, China;

    School of Software, Tsinghua University, Beijing, 100084, China;

    State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

    State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

    State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

    State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    sparse matrix-transpose vector product; sparse matrix-vector product; compressed sparse block; CSB; compressed sparse rows; CSR; GPU;

    机译:稀疏矩阵转置向量积;稀疏矩阵矢量积;压缩稀疏块CSB;压缩稀疏行;企业社会责任显卡;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号