首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer
【24h】

Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer

机译:Sunway TaihuLight超级计算机上稀疏矩阵-矩阵乘法的性能感知模型

获取原文
获取原文并翻译 | 示例

摘要

General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental linear operations in a wide variety of scientific applications. To implement efficient SpGEMM for many large-scale applications, this paper proposes scalable and optimized SpGEMM kernels based on COO, CSR, ELL, and CSC formats on the Sunway TaihuLight supercomputer. First, a multi-level parallelism design for SpGEMM is proposed to exploit the parallelism of over 10 millions cores and better control memory based on the special Sunway architecture. Optimization strategies, such as load balance, coalesced DMA transmission, data reuse, vectorized computation, and parallel pipeline processing, are applied to further optimize performance of SpGEMM kernels. Second, we thoroughly analyze the performance of the proposed kernels. Third, a performance-aware model for SpGEMM is proposed to select the most appropriate compressed storage formats for the sparse matrices that can achieve the optimal performance of SpGEMM on the Sunway. The experimental results show the SpGEMM kernels have good scalability and meet the challenge of the high-speed computing of large-scale data sets on the Sunway. In addition, the performance-aware model for SpGEMM achieves an absolute value of relative error rate of 8.31 percent on average when the kernels are executed in one single process and achieves 8.59 percent on average when the kernels are executed in multiple processes. It is proved that the proposed performance-aware model can perform at high accuracy and satisfies the precision of selecting the best formats for SpGEMM on the Sunway TaihuLight supercomputer.
机译:通用稀疏矩阵-稀疏矩阵乘法(SpGEMM)是广泛的科学应用中的基本线性运算之一。为了在许多大规模应用中实现高效的SpGEMM,本文在Sunway TaihuLight超级计算机上提出了基于COO,CSR,ELL和CSC格式的可扩展和优化的SpGEMM内核。首先,提出了针对SpGEMM的多级并行设计,以利用超过1000万个内核的并行性,并基于特殊的Sunway架构更好地控制内存。优化策略(例如负载平衡,合并的DMA传输,数据重用,矢量化计算和并行管线处理)被用于进一步优化SpGEMM内核的性能。其次,我们彻底分析了提出的内核的性能。第三,提出了针对SpGEMM的性能感知模型,以为稀疏矩阵选择最合适的压缩存储格式,从而可以在Sunway上实现SpGEMM的最佳性能。实验结果表明,SpGEMM内核具有良好的可伸缩性,可以应对Sunway上大规模数据集高速计算的挑战。此外,SpGEMM的性能感知模型在一个内核中执行一个内核时,平均相对错误率的绝对值平均为8.31%;在多个进程中执行内核时,平均错误率的绝对值平均为8.59%。实践证明,所提出的性能感知模型可以在Sunway TaihuLight超级计算机上以较高的精度运行,并满足为SpGEMM选择最佳格式的精度。

著录项

  • 来源
  • 作者单位

    Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China|Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China;

    Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China|Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China;

    Hunan Univ, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China|Natl Supercomp Ctr Changsha, Changsha 410082, Hunan, Peoples R China;

    Hunan Univ, Natl Supercomp Ctr Changsha, Coll Informat Sci & Engn, Changsha 410082, Hunan, Peoples R China|Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada;

    Jiangnan Inst Comp Technol, State Key Lab Math Engn & Adv Comp, Wuxi 214000, Jiangsu, Peoples R China;

    Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Heterogeneous many-core processor; parallelism; performance analysis; performance-aware; SpGEMM; Sunway TaihuLight supercomputer;

    机译:异构的许多核心处理器;平行;性能分析;性能感知;SPGEMM;Sunway Toinghulight超级计算机;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号