首页> 外文会议>The Sixteenth IEEE International Conference on Computational Science and Engineering >An Application-Level Synchronous Checkpoint-Recover Method for Parallel CFD Simulation
【24h】

An Application-Level Synchronous Checkpoint-Recover Method for Parallel CFD Simulation

机译:并行CFD仿真的应用程序级同步检查点恢复方法

获取原文
获取原文并翻译 | 示例

摘要

High Performance Computing (HPC) is increasingly being used in Computational Fluid Dynamics (CFD) simulation for acceleration. However, CFD simulation faces serious reliability problems, and fault tolerant technology must be taken to ensure the efficient execution of the large-scale parallel CFD simulation. In this paper, we propose an application-level synchronous checkpoint-recover method for parallel CFD simulation on the basis of the application features of CFD simulation. In this method, the periodic snapshot output in the CFD simulation is naturally treated as a blocking coordinated checkpoint, and all the processes can resume the execution from the latest checkpoint with an arbitrary number of fail processes. We design the synchronous checkpoint-recovery framework for CFD simulation, and implement it in the open source software Open FOAM. Experimental results demonstrate that our method can well support the fault tolerant in large-scale parallel CFD applications with very little additional overhead on the original cost of CFD periodic snapshot output.
机译:高性能计算(HPC)越来越多地用于计算流体动力学(CFD)仿真中以加速。但是,CFD仿真面临严重的可靠性问题,必须采用容错技术来确保大规模并行CFD仿真的有效执行。本文基于CFD仿真的应用特点,提出了一种并行CFD仿真的应用级同步检查点恢复方法。在这种方法中,CFD模拟中的定期快照输出自然被视为阻塞协调检查点,并且所有进程都可以从具有任意数量失败过程的最新检查点恢复执行。我们设计了用于CFD仿真的同步检查点恢复框架,并在开源软件Open FOAM中实现了该框架。实验结果表明,我们的方法可以很好地支持大型并行CFD应用程序中的容错功能,而CFD定期快照输出的原始成本却没有太多额外开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号