首页> 外文期刊>Journal of Parallel and Distributed Computing >Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation
【24h】

Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

机译:基于DSL的多块结构网眼应用的大规模性能,用于直接数值模拟

获取原文
获取原文并翻译 | 示例
           

摘要

SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics (CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundary-layer interaction problems over highly detailed multi block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domain specific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBL1 which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3x speedups on CPU nodes, while GPUs provide 5x speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 1001 cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL). (C) 2019 Elsevier Inc. All rights reserved.
机译:SBLI(冲击波/边界层相互作用)是一个大型计算流体动力学(CFD)应用,在南安普顿大学开发了20多年,广泛使用英国湍流联盟。它能够在高度详细的多块结构网格几何形状上执行对冲击波/边界层相互作用问题的直接数值模拟(DNS)或大型涡流模拟(LES)。 SBLI在新兴市场持续高性能的情况下,在新兴的巨大平行硬件平台上呈现了数据组织和运动的重大挑战。在本文中,我们通过OPS嵌入式域特定语言展示了实现这一目标的研究。 OPS针对多块结构网格应用程序的域。它提供了C / C ++和FORTRAN中的API,并利用自动代码生成和编译来生产能够在一系列并行硬件系统上运行的可执行文件。 SBLI的核心功能是使用名为OpenSBL1的新框架捕获,这使得开发人员能够使用eInstein符号声明部分微分方程,然后自动离散和生成操作系统(C / C ++)API代码。然后使用操作自动生成各种并行实现。使用这种多层抽象方法,我们展示了可以获得进一步优化的新机会,例如微调计算强度和减少数据移动并自动应用它们。绩效结果表明由于具有OPS和OpenSBLI的高级开发策略,具有匹配或超过所有CPU节点的手动调整原始代码,因此没有表现损失。数据移动优化在CPU节点上提供超过3倍的加速度,而GPU则在最佳执行CPU节点上提供5倍的加速。 OPS生成的并行代码还在CRAY XC30(EPCC ARCCHER)上的近1001个核心上的近1001个核心和在CRAYX​​K7(TITAN在ORNL)上的4K GPU上的良好可扩展性。 (c)2019 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号