首页> 外文会议>2012 IEEE Region 10 Conference: sustainable development through humanitarian technology. >Improving the scalability of transparent checkpointing for GPU computing systems
【24h】

Improving the scalability of transparent checkpointing for GPU computing systems

机译:改善GPU计算系统的透明检查点的可扩展性

获取原文
获取原文并翻译 | 示例

摘要

As the number of nodes in a GPU computing system increases, checkpointing to a global file system becomes more time-consuming due to the I/O bottlenecks and network congestion. To solve this problem, in this paper, we propose a transparent and scalable checkpoint/restart mechanism for OpenCL applications, named Two-level CheCL. As its name implies, Two-level CheCL consists of two different checkpoint implementations, Local CheCL and Global CheCL. Local CheCL avoids checkpointing to the global file system by utilizing node's local storage. Our experimental results show that Local CheCL can accelerate the checkpointing process by up to four times faster than a conventional checkpointing mechanism. We also implement Global CheCL, which utilizes a global file system, to make sure that we always have a global checkpoint file even in the case of a catastrophic failure. We discuss the performance of our proposed mechanism through an analysis with a two-level checkpoint model.
机译:随着GPU计算系统中节点数量的增加,由于I / O瓶颈和网络拥塞,指向全局文件系统的检查点变得更加耗时。为了解决这个问题,在本文中,我们为OpenCL应用程序提出了一种透明且可扩展的检查点/重新启动机制,称为两级CheCL。顾名思义,两级CheCL由两个不同的检查点实现组成,即本地CheCL和全局CheCL。本地CheCL通过利用节点的本地存储避免检查点指向全局文件系统。我们的实验结果表明,本地CheCL可以比传统的检查点机制快四倍地加快检查点过程。我们还实现了使用全局文件系统的Global CheCL,以确保即使在灾难性故障的情况下,我们也始终具有全局检查点文件。我们通过使用两级检查点模型进行分析来讨论所提出机制的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号