Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

机译：利用可变块结构，快速稀疏矩阵矢量乘法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We improve the performance of sparse matrix-vector multiplication (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix, A, into a sum, A_1 + A_2 + ... + A_s, where each term is stored in a new data structure we refer to as unaligned block compressed sparse row (UBCSR) format . A classical approach which stores A in a block compressed sparse row (BCSR) format can also reduce execution time, but the improvements may be limited because BCSR imposes an alignment of the matrix non-zeros that leads to extra work from filled-in zeros. Combining splitting with UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. We show speedups can be as high as 2.1x over no blocking, and as high as 1.8x over BCSR as used in prior work on a set of application matrices. Even when performance does not improve significantly, split UBCSR usually reduces matrix storage.

机译：当矩阵结构由多个不规则对齐的矩形块组成时，我们提高了基于现代高速缓存的超卡机器上的稀疏矩阵 - 矢量乘法（SPMV）的性能。来自有限元建模应用程序的矩阵通常具有这种结构。我们将矩阵A分为总和，A_1 + A_2 + ... + A_S，其中每个术语存储在新的数据结构中，我们将其称为未对齐的块压缩稀疏行（UBCSR）格式。在块压缩稀疏行（BCSR）格式中存储A的经典方法也可以减少执行时间，但是可以限制改进，因为BCSR施加了导致填充零的额外工作的矩阵非零的对准。使用UBCSR的组合拆分减少了这种额外的工作，同时保留了BCSR的通常更低的内存带宽要求和注册级别平铺机会。我们将显示出高于2.1倍的加速度，在一组应用程序矩阵上使用的BCSR中使用的BCSR高达1.8倍。即使性能没有显着提高，拆分UBCSR通常也会减少矩阵存储。

著录项

来源
《International Conference on High Performance Computing and Communications》|2005年||共10页
会议地点
作者
Richard W. Vuduc; Hyun-Jin Moon;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类设计与性能分析;
关键词

相似文献

外文文献
中文文献
专利

1. LightSpMV: Faster CUDA-Compatible Sparse Matrix-Vector Multiplication Using Compressed Sparse Rows [J] . Liu Yongchao, Schmidt Bertil Journal of signal processing systems for signal, image, and video technology . 2018,第1期

机译：LightSpMV：使用压缩的稀疏行更快的CUDA兼容稀疏矩阵矢量乘法
2. Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format [J] . Martone Michele Parallel Computing . 2014,第7期

机译：递归稀疏块格式的高效多线程未转置，转置或对称稀疏矩阵矢量乘法
3. A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs [J] . Arash Ashari, Naser Sedaghati, John Eisenlohr, Journal of Parallel and Distributed Computing . 2015,第feba期

机译：GPU上负载均衡的稀疏矩阵矢量乘法的模型驱动的阻塞策略
4. Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure [C] . Richard W. Vuduc, Hyun-Jin Moon International Conference on High Performance Computing and Communications . 2005

机译：利用可变块结构，快速稀疏矩阵矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Fast sparse matrix-vector multiplication by exploiting variable block structure [O] . Vuduc, R W, Moon, H 2005

机译：利用可变块结构进行快速稀疏矩阵向量乘法

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

摘要

著录项

相似文献

相关主题

期刊订阅