首页> 外文期刊>Concurrency and computation: practice and experience >Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the Tianhe-2 supercomputer
【24h】

Parallelizing and optimizing large-scale 3D multi-phase flow simulations on the Tianhe-2 supercomputer

机译:在天河2号超级计算机上并行化和优化大规模3D多相流模拟

获取原文
获取原文并翻译 | 示例

摘要

The lattice Boltzmann method (LBM) is a widely used computational fluid dynamics method for flow problems with complex geometries and various boundary conditions. Large-scale LBM simulations with increasing resolution and extending temporal range require massive high-performance computing (HPC) resources, thus motivating us to port it onto modern many-core heterogeneous supercomputers like Tianhe-2. Although many-core accelerators such as graphics processing unit and Intel MIC have a dramatic advantage of floating-point performance and power efficiency over CPUs, they also pose a tough challenge to parallelize and optimize computational fluid dynamics codes on large-scale heterogeneous system.In this paper, we parallelize and optimize the open source 3D multi-phase LBM code openlbmflow on the Intel Xeon Phi (MIC) accelerated Tianhe-2 supercomputer using a hybrid and heterogeneous MPI+OpenMP+Offload+single instruction, mulitple data (SIMD) programming model. With cache blocking and SIMD-friendly data structure transformation, we dramatically improve the SIMD and cache efficiency for the single-thread performance on both CPU and Phi, achieving a speedup of 7.9X and 8.8X, respectively, compared with the baseline code. To collaborate CPUs and Phi processors efficiently, we propose a load-balance scheme to distribute workloads among intra-node two CPUs and three Phi processors and use an asynchronous model to overlap the collaborative computation and communication as far as possible. The collaborative approach with two CPUs and three Phi processors improves the performance by around 3.2X compared with the CPU-only approach. Scalability tests show that openlbmflow can achieve a parallel efficiency of about 60% on 2048 nodes, with about 400K cores in total. To the best of our knowledge, this is the largest scale CPU-MIC collaborative LBM simulation for 3D multi-phase flow problems. Copyright © 2015 John Wiley & Sons, Ltd.
机译:格子玻尔兹曼方法(LBM)是一种广泛使用的计算流体动力学方法,用于解决具有复杂几何形状和各种边界条件的流动问题。分辨率提高,时间范围扩大的大规模LBM仿真需要大量的高性能计算(HPC)资源,因此促使我们将其移植到现代多核异构超级计算机(如Tianhe-2)上。尽管图形处理器和Intel MIC之类的多核加速器在浮点性能和电源效率方面都比CPU具有显着优势,但它们在并行化和优化大规模异构系统上的计算流体动力学代码方面也面临着艰巨的挑战。本文我们使用混合和异构MPI + OpenMP + Offload +单指令,多数据(SIMD)编程在Intel Xeon Phi(MIC)加速的Tianhe-2超级计算机上并行化和优化了开源3D多相LBM代码openlbmflow模型。通过缓存阻止和SIMD友好的数据结构转换,我们显着提高了CPU和Phi上单线程性能的SIMD和缓存效率,与基准代码相比,分别提高了7.9倍和8.8倍。为了有效地协作CPU和Phi处理器,我们提出了一种负载均衡方案,以在节点内的两个CPU和三个Phi处理器之间分配工作负载,并使用异步模型来尽可能多地重叠协作计算和通信。与仅使用CPU的方法相比,具有两个CPU和三个Phi处理器的协作方法将性能提高了约3.2倍。可伸缩性测试表明,openlbmflow在2048个节点上可实现约60%的并行效率,总共有约40万个内核。据我们所知,这是针对3D多相流问题的最大规模的CPU-MIC协作LBM仿真。版权所有©2015 John Wiley&Sons,Ltd.

著录项

  • 来源
  • 作者单位

    National University of Defense Technology College of Computer ChangSha China;

    National University of Defense Technology College of Computer ChangSha China;

    National University of Defense Technology National Laboratory for Parallel and Distributed Processing ChangSha China;

    National University of Defense Technology College of Computer ChangSha China;

    National University of Defense Technology National Laboratory for Parallel and Distributed Processing ChangSha China;

    National University of Defense Technology College of Computer ChangSha China;

    National University of Defense Technology College of Computer ChangSha China;

    National University of Defense Technology College of Computer ChangSha China;

    National University of Defense Technology ChangSha China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    heterogeneous system; intel xeon phi; Tianhe‐2; multi‐phase flow; LBM;

    机译:异构系统英特尔至强phi天河2多相流LBM;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号