首页> 外文会议>4th International Workshop on Extreme Scale Programming Models and Middleware >Automatic Generation of High-Order Finite-Difference Code with Temporal Blocking for Extreme-Scale Many-Core Systems
【24h】

Automatic Generation of High-Order Finite-Difference Code with Temporal Blocking for Extreme-Scale Many-Core Systems

机译:具有时间阻塞的超大规模有限差分代码自动生成,用于超大规模多核系统

获取原文
获取原文并翻译 | 示例

摘要

In this paper we describe the basic idea, implementation and achieved performance of our DSL for stencil computation, Formura, on systems based on PEZY-SC2 manycore processor. Formura generates, from high-level description of the differential equation and simple description of finitedifference stencil, the entire simulation code with MPI parallelization with overlapped communication and calculation, advanced temporal blocking and parallelization for many-core processors. Achieved performance is 4.78 PF, or 21.5% of the theoretical peak performance for an explicit scheme for compressive CFD, with the accuracy of fourth-order in space and third-order in time. For a slightly modified implementation of the same scheme, efficiency was slightly lower (17.5%) but actual calculation time per one timestep was faster by 25%. Temporal blocking improved the performance by up to 70%. Even though the B/F number of PEZY-SC2 is low, around 0.02, we have achieved the efficiency comparable to those of highly optimized CFD codes on machines with much higher memory bandwidth such as K computer. We have demonstrated that automatic generation of the code with temporal blocking is a quite effective way to make use of very large-scale machines with low memory bandwidth for large-scale CFD calculations.
机译:在本文中,我们描述了在基于PEZY-SC2多核处理器的系统上,用于模板计算Formura的DSL的基本概念,实现和性能。 Formura通过对微分方程的高级描述和对有限差分模板的简单描述,生成了带有MPI并行化,通信和计算重叠,高级时间阻塞和多核处理器并行化的整个仿真代码。对于压缩CFD的显式方案,所获得的性能为4.78 PF,或理论峰值性能的21.5%,具有四阶空间精度和三阶时间精度。对于相同方案的稍作修改的实现,效率略低(17.5%),但每一个时间步的实际计算时间快了25%。临时阻塞将性能提高了70%。即使PEZY-SC2的B / F值很低,大约为0.02,我们仍可以在内存带宽更高的机器(例如K计算机)上获得与高度优化的CFD代码相当的效率。我们已经证明,使用时间阻塞自动生成代码是一种非常有效的方法,可以将内存带宽低的超大型机器用于大规模CFD计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号