首页> 外文会议>International Joint Conference on Computer Science and Software Engineering >A Transparent Hypervisor-level Checkpoint-Restart Mechanism for a Cluster of Virtual Machines
【24h】

A Transparent Hypervisor-level Checkpoint-Restart Mechanism for a Cluster of Virtual Machines

机译:虚拟机集群的透明管理程序级检查点重启机制

获取原文

摘要

A cluster of virtual machines is a common platform for running MPI applications in cloud computing environments. However, most traditional methods to provide fault tolerance to these applications are not fully transparent and require specific, checkpointing-enabled MPI software. This paper presents a novel Transparent Hypervisor-level Checkpoint-Restart mechanism, namely the Virtual Cluster Checkpoint-Restart (VCCR), to perform checkpoint and restart operations at hypervisor-level. VCCR is highly transparent to MPI applications and guest OS. In VCCR, a software framework consisting of a controller and agent processes is created to perform checkpoint and restart operations for the entire cluster. The checkpoint and restart protocols of VCCR are designed based on the principles of barrier synchronization and virtual time to maintain global consistency and efficiency. We have developed a prototype of VCCR on top the QEMU-KVM software and conducted two preliminary experiments using NAS Parallel Benchmark. Experimental results confirm that VCCR can correctly and efficiently checkpoint and restart a cluster of virtual machines.
机译:虚拟机群集是用于在云计算环境中运行MPI应用程序的通用平台。但是,大多数为这些应用程序提供容错能力的传统方法并不完全透明,需要特定的启用了检查点的MPI软件。本文提出了一种新颖的透明虚拟机管理程序级别的检查点重新启动机制,即虚拟群集检查点重新启动(VCCR),以在管理程序级别执行检查点和重新启动操作。 VCCR对MPI应用程序和来宾OS高度透明。在VCCR中,创建了一个由控制器和代理进程组成的软件框架,以执行整个群集的检查点和重新启动操作。 VCCR的检查点和重新启动协议是根据屏障同步和虚拟时间的原理设计的,以保持全局一致性和效率。我们已经在QEMU-KVM软件之上开发了VCCR原型,并使用NAS并行基准进行了两个初步实验。实验结果证实,VCCR可以正确有效地检查点并重新启动虚拟机集群。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号