Checkpointing vs. Migration for Post-Petascale Supercomputers

机译：后等规模超级计算机的检查点与迁移

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

An alternative to classical fault-tolerant approaches for large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and a preventive measure is taken. We develop analytical performance models for two types of preventive measures: preventive checkpointing and preventive migration. We also develop an analytical model of the performance of a standard periodic checkpoint fault-tolerant approach. We instantiate these models for platform scenarios representative of current and future technology trends. We find that preventive migration is the better approach in the short term by orders of magnitude. However, in the longer term, both approaches have comparable merit with a marginal advantage for preventive checkpointing. We also find that standard non-prediction-based fault tolerance achieves poor scaling when compared to prediction-based failure avoidance, thereby demonstrating the importance of failure prediction capabilities. Finally, our results show that achieving good utilization in truly large-scale machines (e.g., 2^{20} nodes) for parallel workloads will require more than the failure avoidance techniques evaluated in this work.

机译：对于大型集群，经典容错方法的替代方法是避免故障，通过该方法可以预测故障的发生并采取预防措施。我们针对两种类型的预防措施开发分析性能模型：预防性检查点和预防性迁移。我们还开发了标准周期性检查点容错方法的性能分析模型。我们针对代表当前和未来技术趋势的平台方案实例化这些模型。我们发现，从短期来看，预防性迁移是更好的方法。但是，从长远来看，这两种方法都具有可比较的优点，但在预防性检查点方面却具有边际优势。我们还发现，与基于预测的故障避免相比，基于标准的非预测的容错能力实现了较差的缩放，从而证明了故障预测功能的重要性。最后，我们的结果表明，在真正的大型机器（例如2 ^ {20}节点）中为并行工作负载实现良好的利用率将比在本工作中评估的避免故障技术需要更多。

著录项

来源
《The 39th International Conference on Parallel Processing》|2010年|P.168-177|共10页
会议地点
作者
Cappello Franck; Casanova Henri; Robert Yves;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类并行计算机;
关键词
checkpointing; failure prediction; migration; parallel jobs;

机译：检查点;故障预测;迁移;并行作业;

相似文献

外文文献
中文文献
专利

1. PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS [J] . FRANCK CAPPELLO HENRI CASANOVA YVES ROBERT Parallel Processing Letters . 2011,第2期

机译：预防性移民VS.超级超级计算机的预防性检查点
2. PREVENTIVE MIGRATION VS. PREVENTIVE CHECK-POINTING FOR EXTREME SCALE SUPERCOMPUTERS [J] . FRANCK CAPPELLO, HENRI CASANOVA, YVES ROBERT Parallel Processing Letters . 2011,第2期

机译：预防性移民VS.超级超级计算机的预防性检查点
3. On the modelling of optimal coordinated checkpoint period in supercomputers [J] . Morinigo Jose A., Rodriguez-Pascual Manuel, Mayo-Garcia Rafael Journal of supercomputing . 2019,第2期

机译：超级计算机中最佳协调检查站时间的建模
4. Checkpointing vs. Migration for Post-Petascale Supercomputers [C] . Cappello Franck, Casanova Henri, Robert Yves International Conference on Parallel Processing . 2010

机译：检查点与迁移的PetaScale超级计算机的迁移
5. The role of Akt1 in G1/S cell cycle checkpoint bypass and cell migration after genotoxin stress. [D] . Lal, Madhu. 2009

机译：Akt1在基因毒素应激后在G1 / S细胞周期检查点旁路和细胞迁移中的作用。
6. Efficacy of immune-checkpoint inhibitors in PD-L1 selected or unselected patients vs. control group in patients with advanced or metastatic urothelial carcinoma [O] . Lifang Guo, Xin Wang, Shihui Wang, 2021

机译：PD-L1中的免疫检查点抑制剂的疗效或未选择的患者对照组晚期或转移性尿路上皮癌患者的疗效
7. Checkpointing vs. migration for post-petascale supercomputers [O] . Cappello, Franck, Casanova, Henri, Robert, Yves 2010

机译：后千万亿级超级计算机的检查点与迁移

Checkpointing vs. Migration for Post-Petascale Supercomputers

摘要

著录项

相似文献

相关主题

期刊订阅