...
首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >BestSF A Sparse Meta-Format for Optimizing SpMV on GPU
【24h】

BestSF A Sparse Meta-Format for Optimizing SpMV on GPU

机译:最好的稀疏元格式,可以在GPU上优化SPMV

获取原文
获取原文并翻译 | 示例
           

摘要

The Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed to improve this kernel on the recent GPU architectures. However, it has been widely observed that there is no "best-for-all" sparse format for the SpMV kernel on GPU. Indeed, serious performance degradation of an order of magnitude can be observed without a careful selection of the sparse format to use. To address this problem, we propose in this article BestSF (Best Sparse Format), a new learning-based sparse meta-format that automatically selects the most appropriate sparse format for a given input matrix. To do so, BestSF relies on a cost-sensitive classification system trained using Weighted Support Vector Machines (WSVMs) to predict the best sparse format for each input sparse matrix. Our experimental results on two different NVIDIA GPU architectures using a large number of real-world sparse matrices show that BestSF achieved a noticeable overall performance improvement over using a single sparse format. While BestSF is trained to select the best sparse format in terms of performance (GFLOPS), our further experimental investigations revealed that using BestSF also led, in most of the test cases, to the best energy efficiency (MFLOPS/W). To prove its practical effectiveness, we also evaluate the performance and energy efficiency improvement achieved when using BestSF as a building block in a GPU-based Preconditioned Conjugate Gradient (PCG) iterative solver.
机译:稀疏矩阵矢量乘法(SPMV)内核主导了许多科学应用中的计算成本。提出了基于不同稀疏格式的许多实现,以改进最近的GPU架构上的这个内核。然而,已经普遍观察到GPU上的SPMV内核没有“最适合所有”稀疏格式。实际上,可以观察到幅度的严重性能下降,而无需仔细选择要使用的稀疏格式。为了解决这个问题,我们提出了本文的Bestsf(最佳稀疏格式),一种新的基于学习的稀疏元格式,可自动为给定输入矩阵选择最合适的稀疏格式。为此,最好依赖于使用加权支持向量机(WSVM)训练的成本敏感的分类系统来预测每个输入稀疏矩阵的最佳稀疏格式。我们对两种不同的NVIDIA GPU架构的实验结果使用大量真实世界稀疏矩阵表明,最好使用单一稀疏格式实现明显的整体性能改进。虽然Bestsf受过培训以在性能(GFlops)方面选择最佳稀疏格式,但我们的进一步实验研究表明,在大多数测试用例中使用最佳LED也以最佳的能效(MFLOPS / W)。为了证明其实用效果,我们还评估了在基于GPU的预处理共轭梯度(PCG)迭代求解器中使用Bestsf作为构建块时实现的性能和能效改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号