首页> 外文期刊>Parallel Computing >Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors
【24h】

Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors

机译:对多核和多核处理器上的稀疏矩阵产品进行性能优化,建模和分析

获取原文
获取原文并翻译 | 示例
           

摘要

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. We examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. We build the performance model for hash-table and heap-based algorithms, which supports the recipe. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix. Finally, we integrate our implementations into a large-scale protein clustering code named HipMCL, accelerating its SpGEMM kernel by up to 10X and achieving an overall performance boost for the whole HipMCL application by 2.6X. (C) 2019 Elsevier B.V. All rights reserved.
机译:稀疏矩阵矩阵乘法(SpGEMM)是一种计算基元,已广泛用于从传统数值应用到最近的大数据分析和机器学习的领域。尽管已经提出了许多SpGEMM算法,但仍缺乏针对多核和多核处理器的针对硬件的优化,并且无法在各种用例和矩阵下对其性能进行详细分析。我们首先通过Intel Xeon Phi(骑士登陆或KNL)上的内存管理和线程调度来识别和缓解多个瓶颈。我们专门针对多核处理器,开发了基于哈希表的算法,并优化了基于堆的共享内存SpGEMM算法。我们将检查它们的性能以及其他可公开获得的代码。与文献不同,我们的评估还包括代表真实图形算法的用例,例如多源广度优先搜索或三角计数。在大多数情况下,我们的哈希表和基于堆的算法显示出库的显着加速,而其他算法则以不同的矩阵大小,稀疏性,压缩因子和操作类型主导了其他场景。我们总结了深入的评估结果,并制定了针对目标场景的最佳SpGEMM算法的配方。我们为哈希表和基于堆的算法构建性能模型,该模型支持配方。一个关键的发现是,如果不需要在输出矩阵的每一行中对非零进行排序,则基于哈希表的SpGEMM可以显着提高性能。最后,我们将实现集成到名为HipMCL的大规模蛋白质集群代码中,将其SpGEMM内核加速了多达10倍,并使整个HipMCL应用程序的整体性能提高了2.6倍。 (C)2019 Elsevier B.V.保留所有权利。

著录项

  • 来源
    《Parallel Computing》 |2019年第12期|102545.1-102545.13|共13页
  • 作者单位

    Tokyo Inst Technol Tokyo Japan;

    RIKEN Ctr Computat Sci Kobe Hyogo Japan|Tokyo Inst Technol Dept Math & Comp Sci Tokyo Japan;

    Indiana Univ Bloomington IN USA;

    Lawrence Berkeley Natl Lab Berkeley CA USA|Univ Calif Berkeley Dept Elect Engn & Comp Sci Berkeley CA USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Sparse matrix; SpGEMM; Intel KNL;

    机译:稀疏矩阵;SpGEMM;英特尔KNL;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号