...
首页> 外文期刊>PeerJ Computer Science >Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions
【24h】

Computing the sparse matrix vector product using block-based kernels without zero padding on processors with AVX-512 instructions

机译:使用基于块的内核在具有AVX-512指令的处理器上使用零填充来计算稀疏矩阵矢量积

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The sparse matrix-vector product (SpMV) is a fundamental operation in many scientific applications from various fields. The High Performance Computing (HPC) community has therefore continuously invested a lot of effort to provide an efficient SpMV kernel on modern CPU architectures. Although it has been shown that block-based kernels help to achieve high performance, they are difficult to use in practice because of the zero padding they require. In the current paper, we propose new kernels using the AVX-512 instruction set, which makes it possible to use a blocking scheme without any zero padding in the matrix memory storage. We describe mask-based sparse matrix formats and their corresponding SpMV kernels highly optimized in assembly language. Considering that the optimal blocking size depends on the matrix, we also provide a method to predict the best kernel to be used utilizing a simple interpolation of results from previous executions. We compare the performance of our approach to that of the Intel MKL CSR kernel and the CSR5 open-source package on a set of standard benchmark matrices. We show that we can achieve significant improvements in many cases, both for sequential and for parallel executions. Finally, we provide the corresponding code in an open source library, called SPC5.
机译:稀疏矩阵矢量积(SpMV)是来自各个领域的许多科学应用中的基本操作。因此,高性能计算(HPC)社区不断投入大量精力,以在现代CPU架构上提供高效的SpMV内核。尽管已经显示出基于块的内核有助于实现高性能,但是由于它们需要零填充,因此在实践中很难使用它们。在当前的论文中,我们提出了使用AVX-512指令集的新内核,这使得可以使用矩阵存储中没有任何零填充的阻塞方案。我们描述了在汇编语言中高度优化的基于掩码的稀疏矩阵格式及其对应的SpMV内核。考虑到最佳的块大小取决于矩阵,我们还提供了一种方法,该方法可利用以前执行的结果的简单插值来预测要使用的最佳内核。我们在一组标准基准矩阵上比较了我们的方法与英特尔MKL CSR内核和CSR5开源软件包的性能。我们表明,在许多情况下,无论是顺序执行还是并行执行,我们都可以实现显着改进。最后,我们在名为SPC5的开源库中提供了相应的代码。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号