GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication

Yuan Tao; Yangdong Deng; Shuai Mu; Zhenzhong Zhang; Mingfa Zhu; Limin Xiao; Li Ruan

首页> 外文期刊>Concurrency and computation: practice and experience >GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication

【24h】

GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Many high performance computing applications require computing both sparse matrix-vector productrn(SMVP) and sparse matrix-transpose vector product (SMTVP) for better overall performance. Under such arncircumstance, it is critical to maintain a similarly high throughput for these two computing patterns with thernunderlying sparse matrix encoded in a single storage format. The compressed sparse block (CSB) formatrnproposed by Buluç et al. allows computing both problems on multi-core CPUs with nearly identical throughputs.rnOn the other hand, a direct porting of CSB to graphics processing units (GPUs), which have beenrnrecently recognized as a powerful general purpose computing platform, turns out to be inefficient. In thisrnwork, we propose a new data structure, designated as expanded CSB (eCSB), to minimize the throughputrngap between SMVP and SMTVP computations on GPUs, while at the same time enable a high computingrnthroughput. We also use a hybrid storage format to store elements in each block, which can be selectedrndynamically at runtime. Experimental results show that the proposed techniques implemented on a KeplerrnGPU delivers similar throughput on both SMVP and SMTVP and the throughput is up to 13 times fasterrnthan that of the CPU-based CSB implementation. In addition, our eCSB procedure outperforms the previousrnGPU results by up to 188% and 914% in computing SMVP and SMTVP, and we validate the effectivenessrnof eCSB by means of wall-clock time of bi-conjugate gradient algorithm; our eCSB is 25% faster thanrnCompressed Sparse Rows (CSR) and 6% faster than HYB, respectively.

机译：许多高性能计算应用程序需要同时计算稀疏矩阵矢量积（SMVP）和稀疏矩阵转置矢量积（SMTVP），以实现更好的整体性能。在这种情况下，至关重要的是，要用编码为单一存储格式的底层稀疏矩阵为这两个计算模式保持类似的高吞吐量。 Buluç等人提出的压缩稀疏块（CSB）格式。允许在具有几乎相同的吞吐量的多核CPU上计算这两个问题。另一方面，将CSB直接移植到图形处理单元（GPU）一直被认为是功能强大的通用计算平台，效率低下。在本文中，我们提出了一种新的数据结构，称为扩展CSB（eCSB），以最大程度地减少GPU上SMVP和SMTVP计算之间的吞吐率，同时实现较高的计算吞吐量。我们还使用混合存储格式将元素存储在每个块中，可以在运行时动态选择它们。实验结果表明，在KeplerrnGPU上实现的拟议技术在SMVP和SMTVP上均提供了相似的吞吐量，并且其吞吐量比基于CPU的CSB实现的速度快13倍。此外，在计算SMVP和SMTVP时，我们的eCSB程序要比先前的GPU结果高出188％和914％，并且我们通过双共轭梯度算法的挂钟时间验证了eCSB的有效性。我们的eCSB分别比压缩稀疏行（CSR）快25％和比HYB快6％。

著录项

来源
《Concurrency and computation: practice and experience》 |2015年第14期|3771-3789|共19页
作者
Yuan Tao; Yangdong Deng; Shuai Mu; Zhenzhong Zhang; Mingfa Zhu; Limin Xiao; Li Ruan;
展开▼
作者单位

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China,College of Mathematics, Jilin Normal University, Jilin, 136000, China;

School of Software, Tsinghua University, Beijing, 100084, China;

School of Software, Tsinghua University, Beijing, 100084, China;

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China,School of Computer Science and Engineering, Beihang University, Beijing, 100191, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
sparse matrix-transpose vector product; sparse matrix-vector product; compressed sparse block; CSB; compressed sparse rows; CSR; GPU;

机译：稀疏矩阵转置向量积;稀疏矩阵矢量积;压缩稀疏块CSB;压缩稀疏行;企业社会责任显卡;

相似文献

外文文献
中文文献
专利

1. Performance Prediction Based on Statistics of Sparse Matrix-Vector Multiplication on GPUs [J] . Ruixing Wang, Tongxiang Gu, Ming Li Journal of Computer and Communications . 2017,第6期

机译：基于GPU稀疏矩阵矢量乘法统计的性能预测
2. Optimization techniques for sparse matrix-vector multiplication on GPUs [J] . Marco Maggioni, Tanya Berger-Wolf Journal of Parallel and Distributed Computing . 2016,第jula期

机译：GPU上稀疏矩阵向量乘法的优化技术
3. A Novel CSR-Based Sparse Matrix-Vector Multiplication on GPUs [J] . He Guixia, Gao Jiaquan Mathematical Problems in Engineering . 2016,第pta4期

机译：GPU上基于CSR的新型稀疏矩阵矢量乘法
4. Iterative sparse matrix-vector multiplication on in-memory cluster computing accelerated by GPUs for big data [C] . Jiwu Peng, Zheng Xiao, Cen Chen, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery . 2016

机译：GPU加速大数据的内存集群计算中的迭代稀疏矩阵矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Accelerating Sparse Matrix-Vector Multiplication on GPUs using Bit-Representation-Optimized Schemes [O] . Wai Teng Tang, et al. 2013

机译：利用比特表示优化方案加速GpU上的稀疏矩阵向量乘法

GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅