首页> 外文会议>20th international conference on parallel and distributed computing systems >FT-OpenVZ: A Virtualized Approach to Fault-Tolerance in Distributed Systems
【24h】

FT-OpenVZ: A Virtualized Approach to Fault-Tolerance in Distributed Systems

机译:FT-OpenVZ:分布式系统中的容错虚拟化方法

获取原文
获取原文并翻译 | 示例

摘要

We present FT-OpenVZ, a full checkpointing and fault-tolerant solution for virtual private server (VPS) distributed computing. FT-OpenVZ extends Open VZ's VPS checkpointing to include full VPS checkpointing for MPI applications, including incremental file system checkpointing and user-assisted restart of checkpoints. With our solution, we extend the state of virtual machine/VPS fault-tolerance to any MPI-based distributed solution. By checkpointing all child processes, threads, and files within a distributed system, we provide a framework for future fault-tolerance work. For added resiliency to node failure, we include checkpoint replication and show that its use dramatically decreases the burden of checkpointing on network storage/centralized storage solutions. Using replication, FT-OpenVZ eliminates any need for network storage or centralized servers, reducing the impact of checkpointing on non-participating cluster nodes/users. Further, we show that by using replication our solution is scalable, where network storage and centralized server-based solutions are not. Our analysis is based on the NAS Parallel Benchmarks with cluster sizes up to 64 nodes. Using these benchmarks we examine the overhead of checkpointing with replication, demonstrating low overhead for virtualized checkpointing.
机译:我们介绍了FT-OpenVZ,这是针对虚拟专用服务器(VPS)分布式计算的完整检查点和容错解决方案。 FT-OpenVZ扩展了Open VZ的VPS检查点,以包括针对MPI应用程序的完整VPS检查点,包括增量文件系统检查点和用户辅助的检查点重启。通过我们的解决方案,我们将虚拟机/ VPS容错的状态扩展到了任何基于MPI的分布式解决方案。通过在分布式系统中检查所有子进程,线程和文件,我们为以后的容错工作提供了一个框架。为了增加节点故障的恢复能力,我们包括检查点复制,并证明了它的使用大大减少了网络存储/集中式存储解决方案中检查点的负担。使用复制,FT-OpenVZ消除了对网络存储或集中式服务器的任何需求,从而减少了检查点对非参与群集节点/用户的影响。此外,我们表明,通过使用复制,我们的解决方案是可扩展的,而网络存储和基于集中式服务器的解决方案则无法实现。我们的分析基于具有64个节点的集群大小的NAS并行基准。使用这些基准,我们检查了复制检查点的开销,证明了虚拟化检查点的开销很低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号