首页> 外文期刊>Parallel Computing >Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format
【24h】

Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format

机译:递归稀疏块格式的高效多线程未转置,转置或对称稀疏矩阵矢量乘法

获取原文
获取原文并翻译 | 示例

摘要

In earlier work we have introduced the "Recursive Sparse Blocks" (RSB) sparse matrix storage scheme oriented towards cache efficient matrix-vector multiplication (SpMV) and triangular solution (SpSV) on cache based shared memory parallel computers. Both the transposed (SpMV_T) and symmetric (SymSpMV) matrix-vector multiply variants are supported. RSB stands for a meta-format: it recursively partitions a rectangular sparse matrix in quadrants; leaf submatrices are stored in an appropriate traditional format - either Compressed Sparse Rows (CSR) or Coordinate (COO). In this work, we compare the performance of our RSB implementation of SpMV, SpMV_T, SymSpMV to that of the state-of-the-art Intel Math Kernel Library (MKL) CSR implementation on the recent Intel's Sandy Bridge processor. Our results with a few dozens of real world large matrices suggest the efficiency of the approach: in all of the cases, RSB's SymSpMV (and in most cases, SpMV_T as well) took less than half of MKL CSR's time; SpMV's advantage was smaller. Furthermore, RSB's SpMV T is more scalable than MKL's CSR, in that it performs almost as well as SpMV. Additionally, we include comparisons to the state-of-the art format Compressed Sparse Blocks (CSB) implementation. We observed RSB to be slightly superior to CSB in SpMV_T, slightly inferior in SpMV, and better (in most cases by a factor of two or more) in SymSpMV. Although RSB is a non-traditional storage format and thus needs a special constructor, it can be assembled from CSR or any other similar row-ordered representation arrays in the time of a few dozens of matrix-vector multiply executions. Thanks to its significant advantage over MKL's CSR routines for symmetric or transposed matrix-vector multiplication, in most of the observed cases the assembly cost has been observed to amortize with fewer than fifty iterations. (C) 2014 Elsevier B.V. All rights reserved.
机译:在较早的工作中,我们在基于缓存的共享内存并行计算机上引入了“递归稀疏块”(RSB)稀疏矩阵存储方案,该方案面向高速缓存有效的矩阵矢量乘法(SpMV)和三角解(SpSV)。支持转置(SpMV_T)和对称(SymSpMV)矩阵矢量乘法变体。 RSB代表一种元格式:它将一个矩形的稀疏矩阵递归地划分为多个象限;叶子子矩阵以适当的传统格式存储-压缩稀疏行(CSR)或坐标(COO)。在这项工作中,我们将SpMV,SpMV_T,SymSpMV的RSB实现与最新的Intel Sandy Bridge处理器上最新的Intel Math Kernel Library(MKL)CSR实现的性能进行了比较。我们在数十个现实世界中的大型矩阵的结果表明了该方法的有效性:在所有情况下,RSB的SymSpMV(在大多数情况下,还包括SpMV_T)花费的时间不到MKL CSR的一半。 SpMV的优势较小。此外,RSB的SpMV T比MKL的CSR具有更高的可扩展性,因为它的性能几乎与SpMV一样好。此外,我们还包括与最新格式的压缩稀疏块(CSB)实现的比较。我们观察到RSB在SpMV_T中略胜于CSB,在SpMV中略逊于CSB,在SymSpMV中则更好(大多数情况下是两倍或更多倍)。尽管RSB是一种非传统的存储格式,因此需要特殊的构造函数,但是它可以在几十个矩阵矢量乘法执行时从CSR或任何其他类似的行序表示数组进行组装。由于它相对于MKL的CSR例程具有对称或转置矩阵向量乘法的显着优势,因此在大多数观察到的情况下,观察到的组装成本可以用少于五十次的迭代来摊销。 (C)2014 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号