首页> 外文会议>International Conference on High Performance Computing and Communications >Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure
【24h】

Fast Sparse Matrix-Vector Multiplication by Exploiting Variable Block Structure

机译:利用可变块结构,快速稀疏矩阵矢量乘法

获取原文

摘要

We improve the performance of sparse matrix-vector multiplication (SpMV) on modern cache-based superscalar machines when the matrix structure consists of multiple, irregularly aligned rectangular blocks. Matrices from finite element modeling applications often have this structure. We split the matrix, A, into a sum, A_1 + A_2 + ... + A_s, where each term is stored in a new data structure we refer to as unaligned block compressed sparse row (UBCSR) format . A classical approach which stores A in a block compressed sparse row (BCSR) format can also reduce execution time, but the improvements may be limited because BCSR imposes an alignment of the matrix non-zeros that leads to extra work from filled-in zeros. Combining splitting with UBCSR reduces this extra work while retaining the generally lower memory bandwidth requirements and register-level tiling opportunities of BCSR. We show speedups can be as high as 2.1x over no blocking, and as high as 1.8x over BCSR as used in prior work on a set of application matrices. Even when performance does not improve significantly, split UBCSR usually reduces matrix storage.
机译:当矩阵结构由多个不规则对齐的矩形块组成时,我们提高了基于现代高速缓存的超卡机器上的稀疏矩阵 - 矢量乘法(SPMV)的性能。来自有限元建模应用程序的矩阵通常具有这种结构。我们将矩阵A分为总和,A_1 + A_2 + ... + A_S,其中每个术语存储在新的数据结构中,我们将其称为未对齐的块压缩稀疏行(UBCSR)格式。在块压缩稀疏行(BCSR)格式中存储A的经典方法也可以减少执行时间,但是可以限制改进,因为BCSR施加了导致填充零的额外工作的矩阵非零的对准。使用UBCSR的组合拆分减少了这种额外的工作,同时保留了BCSR的通常更低的内存带宽要求和注册级别平铺机会。我们将显示出高于2.1倍的加速度,在一组应用程序矩阵上使用的BCSR中使用的BCSR高达1.8倍。即使性能没有显着提高,拆分UBCSR通常也会减少矩阵存储。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号