Exploiting dense substructures for fast sparse matrix vector multiplication

Manu Shantharam; Anirban Chatterjee; Padma Raghavan

首页> 外文期刊>International Journal of High Performance Computing Applications >Exploiting dense substructures for fast sparse matrix vector multiplication

【24h】

Exploiting dense substructures for fast sparse matrix vector multiplication

机译：利用密集子结构进行快速稀疏矩阵矢量乘法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The execution time of many scientific computing applications is dominated by the time spent in performing sparse matrix vector multiplication (SMV; y←A·x). We consider improving the performance of SMV on multicores by exploiting the dense substructures that are inherently present in many sparse matrices derived from partial differential equation models. First, we identify indistinguishable vertices, i.e., vertices with the same adjacency structure, in a graph representation of the sparse matrix (A) and group them into a supernode. Next, we identify effectively dense blocks within the matrix by grouping rows and columns in each supernode. Finally, by using a suitable data structure for this representation of the matrix, we reduce the number of load operations during SMV while exactly preserving the original sparsity structure of A. In addition, we use ordering techniques to enhance locality in accesses to the vector, x, to yield an SMV kernel that exploits the effectively dense substructures in the matrix. We evaluate our scheme on Intel Nehalem and AMD Shanghai processors. We observe that for larger matrices on the Intel Nehalem processor, our method improves performance on average by 37.35% compared with the traditional compressed sparse row scheme (a blocked compressed form improves performance on average by 30.27%). Benefits of our new format are similar for the AMD processor. More importantly, if we pick for each matrix the best among our method and the blocked compressed scheme, the average performance improvements increase to 40.85%. Additional results indicate that the best performing scheme varies depending on the matrix and the system. We therefore propose an effective density measure that could be used for method selection, thus adding to the variety of options for an auto-tuned optimized SMV kernel that can exploit sparse matrix properties and hardware attributes for high performance.

机译：许多科学计算应用程序的执行时间主要由执行稀疏矩阵矢量乘法（SMV; y←A·x）所花费的时间决定。我们考虑通过利用从偏微分方程模型派生的许多稀疏矩阵中固有的密集子结构来提高SMV在多核上的性能。首先，我们在稀疏矩阵（A）的图形表示中确定不可区分的顶点，即具有相同邻接结构的顶点，并将它们分组为一个超节点。接下来，我们通过对每个超级节点中的行和列进行分组来在矩阵内有效地确定密集块。最后，通过使用合适的数据结构表示矩阵，我们在SMV期间减少了加载操作的次数，同时精确保留了A的原始稀疏结构。此外，我们使用排序技术来增强访问向量的局部性， x，得到一个SMV内核，该内核利用了矩阵中有效密集的子结构。我们在Intel Nehalem和AMD Shanghai处理器上评估我们的方案。我们观察到，对于Intel Nehalem处理器上的较大矩阵，与传统的压缩稀疏行方案相比，我们的方法平均将性能提高37.35％（阻塞压缩形式平均可将性能提高30.27％）。我们的新格式的优势与AMD处理器相似。更重要的是，如果我们为每个矩阵选择我们的方法和分块压缩方案中最好的矩阵，则平均性能提高到40.85％。其他结果表明，最佳性能方案取决于矩阵和系统。因此，我们提出了一种可用于方法选择的有效密度度量，从而为自动调谐的优化SMV内核增加了多种选择，这些内核可以利用稀疏矩阵属性和硬件属性来实现高性能。

著录项

来源
《International Journal of High Performance Computing Applications》 |2011年第3期|p.328-341|共14页
作者
Manu Shantharam; Anirban Chatterjee; Padma Raghavan;
展开▼
作者单位

Department of Computer Science and Engineering The Pennsylvania State University University Park, PA, USA;

Department of Computer Science and Engineering The Pennsylvania State University University Park, PA, USA;

Department of Computer Science and Engineering The Pennsylvania State University University Park, PA, USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
compressed storage formats; envelope ordering; performance; sparse matrix vector multiplication; supernodes;

机译：压缩存储格式;信封订购;性能;稀疏矩阵向量乘法;超节点;

相似文献

外文文献
中文文献
专利

1. LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows [J] . Liu Yongchao, Schmidt Bertil Journal of signal processing systems for signal, image, and video technology . 2018,第1期

机译：LightSpMV：使用压缩的稀疏行更快的CUDA兼容稀疏矩阵矢量乘法
2. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
3. Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model [J] . Michael A. Bender, Gerth Stolting Brodal, Rolf Fagerberg, Theory of computing systems . 2010,第4期

机译：I / O模型中的最佳稀疏矩阵密集向量乘法
4. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure [C] . Richard W. Vuduc, Hyun-Jin Moon International Conference on High Performance Computing and Communications . 2005

机译：利用可变块结构，快速稀疏矩阵矢量乘法
5. Fast space-varying convolution in stray light reduction, fast matrix vector multiplication using the sparse matrix transform, and activation detection in fMRI data analysis. [D] . Wei, Jianing. 2010

机译：快速减少杂散光的空间变化卷积，使用稀疏矩阵变换的快速矩阵向量乘法以及fMRI数据分析中的激活检测。
6. A two-pronged progress in structured dense matrixvector multiplication [O] . Christopher De Sa, Albert Gu, Rohan Puttagunta, -1

机译：结构化密集矩阵矢量乘法的两方面进展
7. Fast sparse matrix-vector multiplication by exploiting variable block structure [O] . Vuduc, R W, Moon, H 2005

机译：利用可变块结构进行快速稀疏矩阵向量乘法

Exploiting dense substructures for fast sparse matrix vector multiplication

摘要

著录项

相似文献

相关主题

期刊订阅