首页> 外文会议>International conference on high performance computing >Efficiency of High Order Spectral Element Methods on Petascale Architectures
【24h】

Efficiency of High Order Spectral Element Methods on Petascale Architectures

机译:高阶谱元方法在Petascale架构上的效率

获取原文

摘要

High order methods for the solution of PDEs expose a tradeoff between computational cost and accuracy on a per degree of freedom basis. In many cases, the cost increases due to higher arithmetic intensity while affecting data movement minimally. As architectures tend towards wider vector instructions and expect higher arithmetic intensities, the best order for a particular simulation may change. This study highlights preferred orders by identifying the high order efficiency frontier of the spectral element method implemented in Nek5000 and NekBox: the set of orders and meshes that minimize computational cost at fixed accuracy. First, we extract Nek's order-dependent computational kernels and demonstrate exceptional hardware utilization by hardware-aware implementations. Then, we perform production-scale calculations of the nonlinear single mode Rayleigh- Taylor instability on BlueGene/Q and Cray XC40-based supercomputers to highlight the influence of the architecture. Accuracy is defined with respect to physical observables, and computational costs are measured by the core-hour charge of the entire application. The total number of grid points needed to achieve a given accuracy is reduced by increasing the polynomial order. On the XC40 and BlueGene/Q, polynomial orders as high as 31 and 15 come at no marginal cost per timestep, respectively. Taken together, these observations lead to a strong preference for high order discretizations that use fewer degrees of freedom. From a performance point of view, we demonstrate up to 60 % full application bandwidth utilization at scale and achieve ≈1 PFlop/s of compute performance in Nek's most flop-intense methods.
机译:用于解决PDE的高阶方法会在每个自由度的基础上在计算成本和准确性之间进行权衡。在许多情况下,由于较高的算术强度而使成本增加,同时对数据移动的影响最小。随着体系结构趋向于更宽的矢量指令并期望更高的算术强度,特定仿真的最佳顺序可能会发生变化。这项研究通过识别在Nek5000和NekBox中实现的频谱元素方法的高阶效率前沿,突出了首选阶:阶和网格集可将固定精度的计算成本降至最低。首先,我们提取Nek的依赖顺序的计算内核,并通过硬件感知的实现展示出出色的硬件利用率。然后,我们在基于BlueGene / Q和Cray XC40的超级计算机上执行非线性单模Rayleigh-Taylor不稳定的生产规模计算,以突出显示该体系结构的影响。相对于物理观测值定义了准确性,而计算成本则由整个应用程序的核心小时费用来衡量。通过增加多项式阶数,可以减少达到给定精度所需的网格点总数。在XC40和BlueGene / Q上,高达31和15的多项式阶数在每个时间步上都没有边际成本。综上所述,这些观察结果导致强烈偏爱使用较少自由度的高阶离散化。从性能的角度来看,我们展示了高达60%的完整应用程序带宽利用率,并使用Nek的大多数扑朔迷离的方法实现了约1 PFlop / s的计算性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号