首页> 外文期刊>Future generation computer systems >A Channel Memory based fault tolerance for MPI applications
【24h】

A Channel Memory based fault tolerance for MPI applications

机译:基于通道内存的MPI应用程序的容错能力

获取原文
获取原文并翻译 | 示例
       

摘要

Fault tolerant message passing environments protect parallel applications against node failures. Very large scale computing systems, ranging from large clusters to worldwide Global Computing systems, require a high level of fault tolerance in order to efficiently run parallel applications. The Channel Memory approach provides the infrastructure for scalable tolerance to simultaneous faults. Along with a specially designed checkpointing system and recovery protocol, this approach has resulted in the MPICH-V architecture. In this paper, we describe CMDE - a stand-alone distributed program system based on MPICH-V architecture and implementing an approach to tolerate faults of Channel Memories.
机译:容错消息传递环境可保护并行应用程序免受节点故障的影响。从大型集群到全球范围内的全球计算系统的超大规模计算系统都需要高度的容错能力,才能有效地运行并行应用程序。通道内存方法提供了可伸缩的同时故障容错基础架构。连同专门设计的检查点系统和恢复协议一起,此方法形成了MPICH-V体系结构。在本文中,我们描述了CMDE-一个基于MPICH-V架构的独立分布式程序系统,并实现了一种容忍信道存储器故障的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号