首页> 外文学位 >Collocated Data Deduplication for Virtual Machine Backup in the Cloud.
【24h】

Collocated Data Deduplication for Virtual Machine Backup in the Cloud.

机译:用于云中虚拟机备份的并置重复数据删除。

获取原文
获取原文并翻译 | 示例

摘要

Cloud platforms that host a large number of virtual machines (VMs) have high storage demand for frequent backups of VM snapshots. Content signature based deduplication is necessary to eliminate excessive redundant blocks. While dedicated backup storage systems can be used to reduce data redundancy, such an architecture is expensive and introduces huge network traffic in a large cluster. This thesis research is focused on a low-cost backup and deduplication service collocated with other cloud services to reduce infrastructure and network cost.;The previous research for cluster-based data deduplication has concentrated on various inline solutions. The first part of the thesis work is a highly parallel batched solution with synchronized backup scalable for a large number of virtual machines. The key idea is to separate duplicate detection from the actual storage backup, and to partition global index and detection requests among machines using fingerprint values. Then each machine conducts duplicate detection partition by partition independently with minimal memory consumption. Another optimization is to allocate and control buffer space for exchanging detection requests and duplicate summaries among machines. The resource requirement in terms of memory and disk usage for the proposed solution is very small while the backup efficiency in terms of overall throughput and time is not compromised. Our evaluation validates this and shows a satisfactory backup throughput in a large cloud setting.;The second part of the thesis work is a VM-centric collocated backup service with inline deduplication. The key difference compared to the previous work is its novelty in fault resilience and low resource usage. We propose a multi-level selective deduplication scheme which integrates similarity-guided and popularity-guided duplicate elimination under a stringent resource requirement. This scheme uses popular common data to facilitate fingerprint comparison, localizes deduplication as much as possible within each VM, and associates underlying file blocks with one VM for most of cases. The main advantage of this scheme is that it strikes a balance between inner and inter VM deduplication, increasing parallelism and improving reliability. Our analysis shows that this VM-centric scheme can provide better fault tolerance while using a small amount of computing and storage resource. We have conducted a comparative evaluation of this scheme on its competitiveness in terms of deduplication efficiency and backup throughput.
机译:托管大量虚拟机(VM)的云平台对于频繁备份VM快照具有很高的存储需求。基于内容签名的重复数据删除对于消除过多的冗余块是必需的。虽然可以使用专用的备份存储系统来减少数据冗余,但是这种架构非常昂贵,并且会在大型群集中引入巨大的网络流量。本文的研究重点是与其他云服务并置以降低基础架构和网络成本的低成本备份和重复数据删除服务。以前基于集群的重复数据删除研究主要集中在各种内联解决方案上。论文的第一部分是一个高度并行的批处理解决方案,其中同步备份可针对大量虚拟机进行扩展。关键思想是将重复检测与实际存储备份分开,并使用指纹值在计算机之间划分全局索引和检测请求。然后,每台机器都以最小的内存消耗独立地逐个分区进行重复检测。另一个优化是分配和控制缓冲区空间,以便在机器之间交换检测请求和重复的摘要。提出的解决方案在内存和磁盘使用方面的资源需求非常小,而在整体吞吐量和时间方面的备份效率却没有受到影响。我们的评估验证了这一点,并显示了在大型云环境中令人满意的备份吞吐量。论文的第二部分是具有内联重复数据删除功能的以VM为中心的并置备份服务。与以前的工作相比,主要区别在于它在故障恢复能力和低资源使用方面的新颖性。我们提出了一种多级选择性重复数据删除方案,该方案在严格的资源需求下集成了相似度指导和流行度指导的重复消除。该方案使用流行的通用数据来促进指纹比较,在每个VM中尽可能地对重复数据删除进行本地化,并且在大多数情况下将底层文件块与一个VM相关联。该方案的主要优点是,它在内部和内部VM重复数据删除之间取得了平衡,从而提高了并行度并提高了可靠性。我们的分析表明,这种以虚拟机为中心的方案可以在使用少量计算和存储资源的同时提供更好的容错能力。我们已经在重复数据删除效率和备份吞吐量方面对该方案的竞争力进行了比较评估。

著录项

  • 作者

    Zhang, Wei.;

  • 作者单位

    University of California, Santa Barbara.;

  • 授予单位 University of California, Santa Barbara.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 128 p.
  • 总页数 128
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号