首页> 中文期刊>计算机学报 >面向神威·太湖之光的PETSc可扩展异构并行算法及其性能优化

面向神威·太湖之光的PETSc可扩展异构并行算法及其性能优化

     

摘要

共性数学库PETSc(Portable,Extensible Toolkit for Scientific Computation)是高性能计算的基础模块,是超级计算机计算环境的基础算法库之一,其性能直接影响调用数学库的高性能数值计算应用的效率.面向国际上首台100P神威·太湖之光异构超级计算机,根据实际研究需要选取PETSc中两个典型用例ex5(单节点线性求解方程组问题)和ex19(多节点求解2D驱动腔问题)进行实验探究.对运行结果分析找到的热点函数主要为PETSc 函数库中7个核心函数,针对这7个核心函数(主要包括向量运算与矩阵运算),提出和实现了其异构并行算法,并结合机器的异构体系结构提出了相应的性能优化方法.在超级计算机上的实验结果为:核心函数并行算法在4主核、256从核的单节点上加速比最大可达到16.4;多节点情况下,当输入规模为16 384时,8192个节点相对于256节点的加速比为32,且加速比随着异构处理器数目的增加接近线性增加,表明PETSc核心函数并行算法在神威·太湖之光超级计算机上具有良好的可扩展性.%Large-scale scientific and engineering calculations such as hydrodynamic calculations,numerical weather forecasting,seismic data processing,genetic engineering,and high-dimensional differential equations are facing with the big performance challenge.Meanwhile,the High Performance Computing (HPC) platform has been significantly developed in recent years.The appearances of multi-core processors and heterogeneous computing platforms dramatically improve the performance of high-performance applications.To fully utilize the computing power of HPC systems,it is necessary to develop specific methodologies to optimize the performance of applications based on the system architecture.The Sunway TaihuLight supercomputer is presently ranked in the TOPS00 list as the fastest supercomputer in the world,with a LINPACK benchmark rating of 93 petaflops.The Sunway TaihuLight uses a total of 40960 Chinese designed SW26010 multi-core 64-bit RISC processors.Portable,Extensible Toolkit for Scientific Computation (PETSc),an indispensable module of high performance computing,is one of basic algorithm libraries widely applied in many high-performance applications.Meanwhile,PETSc is also widely used in partial differential equations,sparse linear algebra and other related problems.The performance of PETSc directly affects the efficiency of applications invoking PETSc.In this paper,we use two most typical cases in PETSc according to actual research needs,that is ex5 (solving problems of linear systems on single node) and ex19 (solving problems of 2D driving cavity on multi nodes) to perform them on the Sunway TaihuLight supercomputer.With the analysis of experimental results,we figure out there are seven core functions including vector calculations and matrix calculations.First of all,for each core function,we do an in-depth research of its characteristics,parallel difficulties,optimizations for the bottlenecks.And then,we determine an appropriate heterogeneous parallel model for these functions on the SW26010 processor (there a total of four heterogeneous parallel model on the Sunway Taihulight).Finally,we figure out the best division strategy for task,determine the size of the data transferred,and design the parallel algorithm on the Sunway TaihuLight supercomputer.Furthermore,a series of novel performance optimization strategies is proposed according to the heterogeneous architecture of the Sunway TaihuLight system.These optimization methods mainly include the access optimization,eliminating data dependency and vectorization optimization.As the experimental results shown in this paper,our parallel algorithms of the seven core functions achieve the maximum speed up to 16.4 on one single node (contains 4 MPEs and 256 CPEs).In the case of run on multiple nodes,the acceleration ratio reaches 32 on 8192 nodes compared to 256 node s,when the input data scale is up to 16384.Besides,the speedup presents an linear tendency with the increasing number of processors.This paper demonstrates that our parallel algorithms of PETSc have good scalability,reliability and security on the Sunway TaihuLight supercomputer,which provides the reference for the similar researches.

著录项

  • 来源
    《计算机学报》|2017年第9期|2057-2069|共13页
  • 作者单位

    湖南大学信息科学与工程学院国家超级计算长沙中心 长沙410082;

    湖南大学信息科学与工程学院国家超级计算长沙中心 长沙410082;

    湖南大学信息科学与工程学院国家超级计算长沙中心 长沙410082;

    湖南大学信息科学与工程学院国家超级计算长沙中心 长沙410082;

    湖南大学信息科学与工程学院国家超级计算长沙中心 长沙410082;

    江南计算技术研究所 江苏无锡214125;

    江南计算技术研究所 江苏无锡214125;

  • 原文格式 PDF
  • 正文语种 chi
  • 中图分类 理论、方法;
  • 关键词

    并行算法设计; PETSc数学库; 可扩展性; 神威·太湖之光;

  • 入库时间 2023-07-25 14:02:41

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号