首页> 外文期刊>Cloud Computing, IEEE Transactions on >Fault Tolerant Stencil Computation on Cloud-Based GPU Spot Instances
【24h】

Fault Tolerant Stencil Computation on Cloud-Based GPU Spot Instances

机译:基于云的GPU点实例的容错模板计算

获取原文
获取原文并翻译 | 示例

摘要

This paper describes a fault tolerant framework for distributed stencil computation on cloud-based GPU clusters. It uses pipelining to overlap the data movement with computation in the halo region as well as parallelises data movement within the GPUs. Instead of running stencil codes on traditional clusters and supercomputers, the computation is performed on the Amazon Web Service GPU cloud, and utilizes its spot instances to improve cost-efficiency. The implementation is based on a low-cost fault-tolerant mechanism to handle the possible termination of the spot instances. Coupled with a price bidding module, our stencil framework not only optimizes for performance but also for cost. Experimental results show that our framework outperforms the state-of-the-art solutions achieving a peak of 25 TFLOPS for 2-D decomposition running on 512 nodes. We also show that the use of spot instances yields good cost-efficiency, increasing the average TFLOPS/USD from 132 to 360.
机译:本文介绍了基于云的GPU集群上的分布式模板计算的容错框架。它使用流水线与光环区域中的计算重叠,以及GPU内的并行数据移动。而不是在传统集群和超级计算机上运行模板代码,而是在Amazon Web服务GPU云上执行计算,并利用其现场实例提高成本效率。该实现基于低成本的容错机制来处理现场实例的可能终止。再加上价格竞标模块,我们的模板框架不仅优化性能,而且为了成本。实验结果表明,我们的框架优于最先进的解决方案,实现了在512个节点上运行的25 TFLOPS的峰值25 TFLOPS的峰值。我们还表明,使用点实例的使用产生了良好的成本效率,从132到360增加了平均TFLOPS / USD。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号