Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format

Martone Michele

首页> 外文期刊>Parallel Computing >Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format

【24h】

Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format

机译：递归稀疏块格式的高效多线程未转置，转置或对称稀疏矩阵矢量乘法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In earlier work we have introduced the "Recursive Sparse Blocks" (RSB) sparse matrix storage scheme oriented towards cache efficient matrix-vector multiplication (SpMV) and triangular solution (SpSV) on cache based shared memory parallel computers. Both the transposed (SpMV_T) and symmetric (SymSpMV) matrix-vector multiply variants are supported. RSB stands for a meta-format: it recursively partitions a rectangular sparse matrix in quadrants; leaf submatrices are stored in an appropriate traditional format - either Compressed Sparse Rows (CSR) or Coordinate (COO). In this work, we compare the performance of our RSB implementation of SpMV, SpMV_T, SymSpMV to that of the state-of-the-art Intel Math Kernel Library (MKL) CSR implementation on the recent Intel's Sandy Bridge processor. Our results with a few dozens of real world large matrices suggest the efficiency of the approach: in all of the cases, RSB's SymSpMV (and in most cases, SpMV_T as well) took less than half of MKL CSR's time; SpMV's advantage was smaller. Furthermore, RSB's SpMV T is more scalable than MKL's CSR, in that it performs almost as well as SpMV. Additionally, we include comparisons to the state-of-the art format Compressed Sparse Blocks (CSB) implementation. We observed RSB to be slightly superior to CSB in SpMV_T, slightly inferior in SpMV, and better (in most cases by a factor of two or more) in SymSpMV. Although RSB is a non-traditional storage format and thus needs a special constructor, it can be assembled from CSR or any other similar row-ordered representation arrays in the time of a few dozens of matrix-vector multiply executions. Thanks to its significant advantage over MKL's CSR routines for symmetric or transposed matrix-vector multiplication, in most of the observed cases the assembly cost has been observed to amortize with fewer than fifty iterations. (C) 2014 Elsevier B.V. All rights reserved.

机译：在较早的工作中，我们在基于缓存的共享内存并行计算机上引入了“递归稀疏块”（RSB）稀疏矩阵存储方案，该方案面向高速缓存有效的矩阵矢量乘法（SpMV）和三角解（SpSV）。支持转置（SpMV_T）和对称（SymSpMV）矩阵矢量乘法变体。 RSB代表一种元格式：它将一个矩形的稀疏矩阵递归地划分为多个象限；叶子子矩阵以适当的传统格式存储-压缩稀疏行（CSR）或坐标（COO）。在这项工作中，我们将SpMV，SpMV_T，SymSpMV的RSB实现与最新的Intel Sandy Bridge处理器上最新的Intel Math Kernel Library（MKL）CSR实现的性能进行了比较。我们在数十个现实世界中的大型矩阵的结果表明了该方法的有效性：在所有情况下，RSB的SymSpMV（在大多数情况下，还包括SpMV_T）花费的时间不到MKL CSR的一半。 SpMV的优势较小。此外，RSB的SpMV T比MKL的CSR具有更高的可扩展性，因为它的性能几乎与SpMV一样好。此外，我们还包括与最新格式的压缩稀疏块（CSB）实现的比较。我们观察到RSB在SpMV_T中略胜于CSB，在SpMV中略逊于CSB，在SymSpMV中则更好（大多数情况下是两倍或更多倍）。尽管RSB是一种非传统的存储格式，因此需要特殊的构造函数，但是它可以在几十个矩阵矢量乘法执行时从CSR或任何其他类似的行序表示数组进行组装。由于它相对于MKL的CSR例程具有对称或转置矩阵向量乘法的显着优势，因此在大多数观察到的情况下，观察到的组装成本可以用少于五十次的迭代来摊销。（C）2014 Elsevier B.V.保留所有权利。

著录项

来源
《Parallel Computing》 |2014年第7期|251-270|共20页
作者
Martone Michele;
展开▼
作者单位

Max Planck Inst Plasma Phys, Boltzmannstr 2, D-85748 Garching, Germany;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparse matrix-vector multiply; Symmetric matrix-vector multiply; Transpose matrix-vector multiply; Shared memory parallel; Cache blocking; Sparse matrix assembly;

机译：稀疏矩阵向量乘;对称矩阵向量乘;转置矩阵向量乘;共享内存并行;缓存块;稀疏矩阵组装;

相似文献

外文文献
中文文献
专利

1. A Recursive Algebraic Coloring Technique for Hardware-efficient Symmetric Sparse Matrix-vector Multiplication [J] . CHRISTIE ALAPPAT, ACHIM BASERMANN, ALAN R.BISHOP, ACM Transactions on Parallel Computing . 2020,第3期

机译：硬件高效对称稀疏矩阵乘法递归代数着色技术
2. A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS [J] . Kreutzer Moritz, Hager Georg, Wellein Gerhard, SIAM Journal on Scientific Computing . 2014,第5期

机译：在具有宽模拟单元的现代处理器上有效地通用稀疏矩阵-向量乘法的统一稀疏矩阵数据格式
3. Efficient sparse matrix-vector multiplication using cache oblivious extension quadtree storage format [J] . Jilin Zhang, Jian Wan, Fangfang Li, Future generation computer systems . 2016,第JANa期

机译：使用缓存遗忘扩展四叉树存储格式的有效稀疏矩阵矢量乘法
4. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks [C] . Aydin Buluc, Jeremy T. Fineman, Matteo Frigo, 21st annual symposium on parallelism in algorithms and architectures 2009 . 2009

机译：使用压缩稀疏块的并行稀疏矩阵向量和矩阵转置向量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. A Fast and Robust Non-Sparse Signal Recovery Algorithm for Wearable ECG Telemonitoring Using ADMM-Based Block Sparse Bayesian Learning [O] . Yunfei Cheng, Yalan Ye, Mengshu Hou, 2018

机译：基于ADMM的块稀疏贝叶斯学习的可穿戴式ECG远程监控的快速鲁棒非稀疏信号恢复算法
7. Efficient Multithreaded Untransposed, Transposed or Symmetric Sparse Matrix-Vector Multiplication with the Recursive Sparse Blocks Format [O] . Michele Martonea 2016

机译：具有递归稀疏块格式的高效多线程非转置，转置或对称稀疏矩阵 - 向量乘法

Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅