首页> 外文期刊>Concurrency, practice and experience >Overlapping communications in gyrokinetic codes on accelerator-based platforms
【24h】

Overlapping communications in gyrokinetic codes on accelerator-based platforms

机译:在基于加速器的平台上以运动学代码重叠通信

获取原文
获取原文并翻译 | 示例

摘要

Communication and computation overlapping techniques have been introduced in the five-dimensional gyrokinetic codes GYSELA and GKV. In order to anticipate some of the exa-scale requirements, these codes were ported to the modern accelerators, Xeon Phi KNL and Tesla P 100 GPU. On accelerators, a serial version of GYSELA on KNL and GKV on GPU are respectively 1.3x and 7.4x faster than those on a single Skylake processor (a single socket). For the scalability, we have measured GYSELA performance on Xeon Phi KNL from 16 to 512 KNLs (1024 to 32k cores) and GKV performance on Tesla P 100 GPU from 32 to 256 GPUs. In their parallel versions, transpose communication in semi-Lagrangian solver in GYSELA or Convolution kernel in GKV turned out to be a main bottleneck. This indicates that in the exa-scale, the network constraints would be critical. In order to mitigate the communication costs, the pipeline and task-based overlapping techniques have been implemented in these codes. The GYSELA 2D advection solver has achieved a 33% to 92% speed up, and the GKV 2D convolution kernel has achieved a factor of 2 speed up with pipelining. The task-based approach gives 11% to 82% performance gain in the derivative computation of the electrostatic potential in GYSELA. We have shown that the pipeline-based approach is applicable with the presence of symmetry, while the task-based approach can be applicable to more general situations.
机译:在五维陀螺动力学代码GYSELA和GKV中引入了通信和计算重叠技术。为了满足某些exa级的要求,这些代码已移植到现代加速器Xeon Phi KNL和Tesla P 100 GPU上。在加速器上,KNL上的GYSELA串行版本和GPU上的GKV串行版本分别比单个Skylake处理器(单个插槽)快1.3倍和7.4倍。为了实现可扩展性,我们在Xeon Phi KNL上从16到512 KNL(1024到32k内核)测量了GYSELA性能,在Tesla P 100 GPU上从32到256 GPU测量了GKV性能。在其并行版本中,事实证明,GYSELA中的半拉格朗日求解器中的转置通信或GKV中的卷积内核中的转置通信是主要瓶颈。这表明在exa规模中,网络约束至关重要。为了减轻通信成本,已经在这些代码中实现了流水线和基于任务的重叠技术。 GYSELA 2D对流求解器的速度提高了33%到92%,而GKV 2D卷积核通过流水线实现了2倍的加速。基于任务的方法在GYSELA中静电势的导数计算中提供11%到82%的性能提升。我们已经表明,基于管道的方法适用于存在对称性的情况,而基于任务的方法可以适用于更一般的情况。

著录项

  • 来源
    《Concurrency, practice and experience》 |2020年第5期|e5551.1-e5551.21|共21页
  • 作者

  • 作者单位

    CEA IRFM F-13108 St Paul Les Durance France|Natl Inst Quantum & Radiol Sci & Technol Rokkasho Fus Inst Aomori Japan;

    CEA IRFM F-13108 St Paul Les Durance France;

    Univ Paris Saclay Univ Paris Sud UVSQ Maison Simulat CEA CNRS Gif Sur Yvette France;

    Nagoya Univ Dept Phys Nagoya Aichi Japan;

    Japan Atom Energy Agcy CCSE Chiba Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    overlap; semi-Lagrangian; spectral; Tesla P100 GPU; transpose communication; Xeon Phi KNL;

    机译:交叠;半拉格朗日光谱Tesla P100 GPU;换位通讯;Xeon Phi KNL;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号