首页> 外文期刊>Concurrency and computation: practice and experience >An efficient SIMD compression format for sparse matrix-vectormultiplication
【24h】

An efficient SIMD compression format for sparse matrix-vectormultiplication

机译:稀疏矩阵-向量乘法的有效SIMD压缩格式

获取原文
获取原文并翻译 | 示例

摘要

Sparse matrix-vectormultiplication (SpMV) is an essential kernel in sparse linear algebra and hasbeen studied extensively on all modern processor and accelerator architectures. CompressedSparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR-basedSpMV has poor performance on processors with vector units. In order to take full advantageof SIMD acceleration technology in SpMV, we proposed a new matrix storage format calledCSR-SIMD. The newstorage format compresses the non-zero elements intomany variable-lengthdata fragments with consecutive memory access addresses. Thus, the data locality of sparsematrix A and dense vector x expands and the floating-point operations for each fragment canbe completely calculated by vectorized implementation on wide SIMD units. Our experimentalresults indicate that CSR-SIMD has better storage efficiency and low-overhead for format conversion.Besides, the newformat achieves high scalability on wide SIMD units. In comparison withtheCSR-based andBCSR-basedSpMV,CSR-SIMD obtains better performance on FT1500A, IntelXeon, and Intel Xeon Phi.
机译:稀疏矩阵向量乘法(SpMV)是稀疏线性代数中必不可少的内核,并且已经在所有现代处理器和加速器体系结构上进行了广泛的研究。压缩 r n稀疏行(CSR)是稀疏矩阵存储的常用格式。但是,基于CSR的 r nSpMV在具有矢量单元的处理器上的性能较差。为了充分利用SpMV中的SIMD加速技术,我们提出了一种新的矩阵存储格式,称为 r nCSR-SIMD。新闻存储格式将非零元素压缩为具有连续内存访问地址的许多可变长度 r ndata片段。因此,稀疏 r n矩阵A和密集向量x的数据局部性扩展了,并且可以通过在宽SIMD单元上进行矢量化实现来完全计算每个片段的浮点运算。我们的实验结果 r n n表明CSR-SIMD具有更好的存储效率和较低的格式转换开销。 r n此外,新格式还可以在宽SIMD单元上实现高可伸缩性。与基于CSR和基于BCSR的SpMV相比,CSR-SIMD在FT1500A,Intel rXeon和Intel Xeon Phi上获得更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号