首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer
【24h】

Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer

机译:Sunway Tailwulight超级计算机上稀疏矩阵乘法的性能感知模型

获取原文
获取原文并翻译 | 示例

摘要

General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.
机译:一般稀疏矩阵稀疏矩阵乘法(SPGEMM)是各种科学应用中的基本线性操作之一。为了实现许多大型应用的高效SPGEMM,本文提出了基于Coo,CSR,ELL和Sunway Toihulight超级计算机上的可扩展和优化的SPGEMM内核。首先,提出了一种用于SPGEMM的多级并行设计,以利用超过10万核的并行性,并根据特殊的Sunway架构利用超过10万核的平行性和更好的控制内存。应用优化策略,例如负载平衡,结合的DMA传输,数据重用,矢量化计算和并行管道处理,以进一步优化SPGEMM内核的性能。其次,我们彻底分析了所提出的内核的性能。第三,建议为SPGEMM进行性能感知模型,为稀疏矩阵选择最合适的压缩存储格式,可以实现SPGemm在Sunway上的最佳性能。实验结果表明SPGEMM内核具有良好的可扩展性,并满足Sunway上大型数据集的高速计算的挑战。此外,SPGEMM的性能感知模型在一个过程中执行内核时平均达到8.31%的相对误差率的绝对值,并且当在多个进程中执行内核时,平均达到8.59%。事实证明,建议的性能感知模型可以以高精度执行,满足在Sunway Toihulight超级计算机上选择SPGEMM最佳格式的精确度。

著录项

  • 来源
  • 作者单位

    Hunan Univ Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China|Natl Supercomp Ctr Changsha Changsha 410082 Hunan Peoples R China;

    Hunan Univ Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China|Natl Supercomp Ctr Changsha Changsha 410082 Hunan Peoples R China;

    Hunan Univ Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China|Natl Supercomp Ctr Changsha Changsha 410082 Hunan Peoples R China;

    Hunan Univ Natl Supercomp Ctr Changsha Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China|Univ Waterloo David R Cheriton Sch Comp Sci Waterloo ON N2L 3G1 Canada;

    Jiangnan Inst Comp Technol State Key Lab Math Engn & Adv Comp Wuxi 214000 Jiangsu Peoples R China;

    Univ Florida Dept Elect & Comp Engn Gainesville FL 32611 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Heterogeneous many-core processor; parallelism; performance analysis; performance-aware; SpGEMM; Sunway TaihuLight supercomputer;

    机译:异构的许多核心处理器;平行;性能分析;性能感知;SPGEMM;Sunway Toinghulight超级计算机;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号