...
首页> 外文期刊>Concurrency, practice and experience >Dynamic cluster strategy for hierarchical rollback-recovery protocols in MPI HPC applications
【24h】

Dynamic cluster strategy for hierarchical rollback-recovery protocols in MPI HPC applications

机译:MPI HPC应用程序中用于分层回滚恢复协议的动态集群策略

获取原文
获取原文并翻译 | 示例

摘要

Fault tolerance in parallel computing becomes increasingly important with a significant rise in high-performance computing systems. Coordinated checkpointing and message logging protocols are commonly used fault tolerance mechanisms for message-passing applications. However, these mechanisms are insufficient because of their severe drawbacks. Hierarchical rollback-recovery protocols, combining coordinated checkpointing with message logging, are a better solution. However, such protocols may not obtain the appropriate efficiency because the communication pattern in different stages of applications may vary at runtime. In an effort to improve the efficiency of hierarchical rollback-recovery protocols, we propose a dynamic cluster strategy to adapt to the runtime variation of communication pattern by using a prediction scheme. Finally, the efficiency and scalability of the dynamic cluster strategy are evaluated using 2 static process partition algorithms on the High-Performance Linpack benchmark.
机译:随着高性能计算系统的显着提高,并行计算中的容错能力变得越来越重要。协调检查点和消息记录协议是消息传递应用程序中常用的容错机制。但是,这些机制由于其严重的缺陷而不够用。分层回滚恢复协议(将协调检查点与消息日志记录结合在一起)是一个更好的解决方案。但是,这样的协议可能无法获得适当的效率,因为应用程序不同阶段的通信模式可能会在运行时发生变化。为了提高分层回滚恢复协议的效率,我们提出了一种动态聚类策略,通过使用预测方案来适应通信模式的运行时变化。最后,在高性能Linpack基准上使用2种静态过程分区算法评估了动态集群策略的效率和可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号