【24h】

Optimizing and Scaling HPCG on Tianhe-2: Early Experience

机译:在天河2号上优化和扩展HPCG:早期经验

获取原文

摘要

In this paper, a first attempt has been made on optimizing and scaling HPCG on the world's largest supercomputer, Tianhe-2. This early work focuses on the optimization of the CPU code without using the Intel Xeon Phi coprocessors. In our work, we reformulate the basic CG algorithm to minimize the cost of collective communication and employ several optimizing techniques such as SIMDization, loop unrolling, forward and backward sweep fusion, OpenMP parallization to further enhance the performance of kernels such as the sparse matrix vector multiplication, the symmetric Gauss-Seidel relaxation and the geometric multigrid v-cycle. We successfully scale the HPCG code from 256 up to 6,144 nodes (147,456 CPU cores) on Tianhe-2, with a nearly ideal weak scalability and an aggregate performance of 79.83 Tflops, which is 6.38X higher than the reference implementation.
机译:本文首次尝试在世界上最大的超级计算机Tianhe-2上优化和缩放HPCG。这项早期工作的重点是在不使用英特尔至强融核协处理器的情况下优化CPU代码。在我们的工作中,我们重新设计了基本的CG算法,以最大程度地降低了集体通信的成本,并采用了几种优化技术,例如SIMD化,循环展开,前向和后向扫频融合,OpenMP并行化,以进一步增强诸如稀疏矩阵向量之类的内核的性能。乘法,对称高斯-塞德尔松弛和几何多重网格v周期。我们在Tianhe-2上成功地将HPCG代码从256个节点扩展到了6,144个节点(147,456个CPU内核),具有近乎理想的弱扩展性和79.83 Tflops的综合性能,比参考实现高出6.38倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号