首页> 外文会议>Parallel Processing Workshops, 2009. ICPPW '09 >Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System
【24h】

Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System

机译:分析IBM Blue Gene / P系统上应用程序的检查点趋势

获取原文

摘要

Current petascale systems have tens of thousands of hardware components and complex system software stacks, which increase the probability of faults occurring during the lifetime of a process. Checkpointing has been a popular method of providing fault tolerance in high-end systems. While considerable research has been done to optimize checkpointing, in practice the method still involves a high-cost overhead for users. In this paper, we study the checkpointing overhead seen by applications running on leadership-class machines such as the IBM Blue Gene/P at Argonne National Laboratory. We study various applications and design a methodology to assist users in understanding and choosing checkpointing frequency and reducing the overhead incurred. In particular, we study three popular applicationsȁ4;the Grid-Based Projector-Augmented Wave application, the Carr-Parrinello Molecular Dynamics application, and a Nek5000 computational fluid dynamics applicationȁ4;and analyze their memory usage and possible checkpointing trends on 32,768 processors of the Blue Gene/P system.
机译:当前的千万亿次系统具有成千上万的硬件组件和复​​杂的系统软件堆栈,这增加了在过程生命周期内发生故障的可能性。 Checkpointing已经成为在高端系统中提供容错能力的流行方法。尽管已经进行了大量研究来优化检查点,但实际上,该方法仍会给用户带来高昂的开销。在本文中,我们研究了在领先级机器(例如,阿贡国家实验室的IBM Blue Gene / P)上运行的应用程序所看到的检查点开销。我们研究了各种应用程序,并设计了一种方法来帮助用户理解和选择检查点频率并减少产生的开销。特别是,我们研究了三个流行的应用程序ȁ4;基于网格的投影仪增强波应用程序,Carr-Parrinello分子动力学应用程序和Nek5000计算流体动力学应用程序ȁ4;并分析了它们在32,768个Blue处理器上的内存使用情况和可能的检查点趋势。基因/ P系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号