首页> 外文会议>International Conference on High Performance Computing and Communications;International Conference on Smart City;IEEE International Conference on Data Science and Systems >Improving performance for simulating complex fluids on massively parallel computers by component loop-unrolling and communication hiding
【24h】

Improving performance for simulating complex fluids on massively parallel computers by component loop-unrolling and communication hiding

机译:通过组件环路展开和通信隐藏在大规模平行计算机上模拟复杂流体的性能

获取原文

摘要

Due to the complex geometry and physical models of real-world engineering applications, the parallel performance of the mainstream computational fluid dynamics(CFD) codes is unsatisfactory. For complex fluids, an extra stress tensor governed by constitutive equations including nine components brings much more amount of computations. This paper focused on optimizing the most compute-intensive part of a simulation for complex fluids: the iterative linear solver for solving multicomponent equations. Based on the most widely used opensource CFD code OpenFOAM, we unrolled the component loops and replaced the blocking collective MPI calls to non-blocking communications. After operation rescheduling between the loops, the collective communications could be partly overlapped by the computations. Taking the preconditioned conjugate gradient (PCG) algorithm for instance, we presented the complete loop unrolled algorithm for solving multi-component equations. The numerical experiments showed 8.0%~29.0% simulation time reduction for a demonstrative case with 2 million cells on 64~2048 cores. It is worth noting that the approach proposed in this paper is a high-level scheduling algorithm and could be used in combination with other intra-component optimization algorithms, e.g. the pipelined CG methods.
机译:由于现实世界工程应用的复杂几何和物理模型,主流计算流体动力学(CFD)代码的并行性能是不令人满意的。对于复杂的流体,由包括九个组件的组成方程来治理的额外压力张量带来了更多的计算量。本文集中于优化复杂流体模拟的最具计算密集型部分:用于求解多组分方程的迭代线性求解器。基于最广泛使用的OpenSource CFD代码OpenFoam,我们展开了组件循环,并将阻塞集体MPI调用替换为非阻塞通信。在循环之间进行重新安排之后,可以通过计算部分地重叠集体通信。考虑到预处理的共轭梯度(PCG)算法,我们介绍了解决多组分方程的完整回路展开算法。数值实验表明,在64〜2048芯上具有200万电池的示范性案例显示了8.0%〜29.0%的模拟时间。值得注意的是,本文提出的方法是一种高级调度算法,并且可以与其他组分内部优化算法组合使用,例如,流水线CG方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号