An efficient SIMD compression format for sparse matrix-vectormultiplication

Xinhai Chen; Peizhen Xie; Lihua Chi; Jie Liu; Chunye Gong

首页> 外文期刊>Concurrency and computation: practice and experience >An efficient SIMD compression format for sparse matrix-vectormultiplication

【24h】

An efficient SIMD compression format for sparse matrix-vectormultiplication

机译：稀疏矩阵-向量乘法的有效SIMD压缩格式

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Sparse matrix-vectormultiplication (SpMV) is an essential kernel in sparse linear algebra and hasbeen studied extensively on all modern processor and accelerator architectures. CompressedSparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR-basedSpMV has poor performance on processors with vector units. In order to take full advantageof SIMD acceleration technology in SpMV, we proposed a new matrix storage format calledCSR-SIMD. The newstorage format compresses the non-zero elements intomany variable-lengthdata fragments with consecutive memory access addresses. Thus, the data locality of sparsematrix A and dense vector x expands and the floating-point operations for each fragment canbe completely calculated by vectorized implementation on wide SIMD units. Our experimentalresults indicate that CSR-SIMD has better storage efficiency and low-overhead for format conversion.Besides, the newformat achieves high scalability on wide SIMD units. In comparison withtheCSR-based andBCSR-basedSpMV,CSR-SIMD obtains better performance on FT1500A, IntelXeon, and Intel Xeon Phi.

机译：稀疏矩阵向量乘法（SpMV）是稀疏线性代数中必不可少的内核，并且已经在所有现代处理器和加速器体系结构上进行了广泛的研究。压缩 r n稀疏行（CSR）是稀疏矩阵存储的常用格式。但是，基于CSR的 r nSpMV在具有矢量单元的处理器上的性能较差。为了充分利用SpMV中的SIMD加速技术，我们提出了一种新的矩阵存储格式，称为 r nCSR-SIMD。新闻存储格式将非零元素压缩为具有连续内存访问地址的许多可变长度 r ndata片段。因此，稀疏 r n矩阵A和密集向量x的数据局部性扩展了，并且可以通过在宽SIMD单元上进行矢量化实现来完全计算每个片段的浮点运算。我们的实验结果 r n n表明CSR-SIMD具有更好的存储效率和较低的格式转换开销。 r n此外，新格式还可以在宽SIMD单元上实现高可伸缩性。与基于CSR和基于BCSR的SpMV相比，CSR-SIMD在FT1500A，Intel rXeon和Intel Xeon Phi上获得更好的性能。

著录项

来源
《Concurrency and computation: practice and experience》 |2018年第23期|e4800.1-e4800.10|共10页
作者
Xinhai Chen; Peizhen Xie; Lihua Chi; Jie Liu; Chunye Gong;
展开▼
作者单位

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China;

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China;

Institute of Advanced Science and Technology, Hunan Institute of Traffic Engineering, Hengyang, China;

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China;

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
compressed sparse row format (CSR); performance optimization; single instructionmultiple data (SIMD); sparse matrix-vectormultiplication (SpMV);

机译：压缩稀疏行格式（CSR）;性能优化;单指令多数据（SIMD）;稀疏矩阵-向量乘法（SpMV）;

相似文献

外文文献
中文文献
专利

1. A UNIFIED SPARSE MATRIX DATA FORMAT FOR EFFICIENT GENERAL SPARSE MATRIX-VECTOR MULTIPLICATION ON MODERN PROCESSORS WITH WIDE SIMD UNITS [J] . Kreutzer Moritz, Hager Georg, Wellein Gerhard, SIAM Journal on Scientific Computing . 2014,第5期

机译：在具有宽模拟单元的现代处理器上有效地通用稀疏矩阵-向量乘法的统一稀疏矩阵数据格式
2. VBSF: a new storage format for SIMD sparse matrix-vector multiplication on modern processors [J] . Journal of supercomputing . 2020,第3期

机译：VBSF：一种用于现代处理器上的SIMD稀疏矩阵矢量乘法的新存储格式
3. Efficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format [J] . Martone Michele Parallel Computing . 2014,第7期

机译：递归稀疏块格式的高效多线程未转置，转置或对称稀疏矩阵矢量乘法
4. FPGA Implementations of 3D-SIMD Processor Architecture for Deep Neural Networks Using Relative Indexed Compressed Sparse Filter Encoding Format and Stacked Filters Stationary Flow [C] . Yuechao Gao, Nianhong Liu, Sheng Zhang International Conference on Automation, Mechanical Control and Computational Engineering . 2018

机译：使用相对索引压缩稀疏过滤器编码格式和堆叠过滤器的深度神经网络的3D-SIMD处理器架构的FPGA实现
5. Sparse and Low-Rank Techniques for the Efficient Restoration of Images =Sparse and Low-Rank Techniques for the Efficient Restoration of Images [D] . Zhang, Mingli. 2017

机译：高效的图像稀疏和低秩技术=高效的图像稀疏和低秩技术
6. Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences [O] . Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, -1

机译：核苷酸档案格式（NAF）可实现DNA序列的有效无损无参考压缩
7. A unified sparse matrix data format for efficient general sparse matrix-vector multiply on modern processors with wide SIMD units [O] . Kreutzer, Moritz, Hager, Georg, Wellein, Gerhard, 2014

机译：一种统一的稀疏矩阵数据格式，用于有效的一般稀疏矩阵向量乘以具有宽sImD单位的现代处理器

An efficient SIMD compression format for sparse matrix-vectormultiplication

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅