首页> 外文期刊>Journal of Parallel and Distributed Computing >Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation
【24h】

Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

机译:直接数字仿真的基于DSL的多块结构化网格应用程序的大规模性能

获取原文
获取原文并翻译 | 示例
       

摘要

SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics (CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundary-layer interaction problems over highly detailed multi block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domain specific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBL1 which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3x speedups on CPU nodes, while GPUs provide 5x speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 1001 cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL). (C) 2019 Elsevier Inc. All rights reserved.
机译:SBLI(冲击波/边界层相互作用)是一种大规模的计算流体动力学(CFD)应用程序,在南安普顿大学开发了20多年,并在英国湍流协会中广泛使用。它能够对高度详细的多块结构网格几何图形执行冲击波/边界层相互作用问题的直接数值模拟(DNS)或大涡模拟(LES)。 SBLI提出了数据组织和移动方面的重大挑战,要在新兴的大规模并行硬件平台上获得持续的高性能,就必须克服这些挑战。在本文中,我们介绍了通过OPS嵌入式领域特定语言实现此目标的研究。 OPS面向多块结构的网格应用程序的领域。它提供了嵌入在C / C ++和Fortran中的API,并利用自动代码生成和编译来生成能够在一系列并行硬件系统上运行的可执行文件。 SBLI的核心功能是使用称为OpenSBL1的新框架捕获的,该框架使开发人员可以使用Einstein表示法声明偏微分方程,然后自动进行离散化并生成OPS(C / C ++)API代码。然后,OPS用于自动生成各种并行实现。使用这种多层抽象方法,我们演示了如何获得进一步优化的新机会,例如微调计算强度,减少数据移动并自动应用它们。性能结果表明,由于使用OPS和OpenSBLI的高级开发策略,在所有测试的CPU节点上,性能匹配或超过了手工调整的原始代码,因此不会造成性能损失。数据移动优化可在CPU节点上提供超过3倍的加速,而GPU在性能最佳的CPU节点上提供5倍的加速。 OPS生成的并行代码还在Cray XC30(EPCC的ARCHER)和CrayXK7(ORNL的Titan)的4K GPU上展示了出色的可扩展性。 (C)2019 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号