【24h】

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets

机译:使用备份数据集确保重复数据删除存储的所需读取性能

获取原文
获取原文并翻译 | 示例

摘要

Data deduplication has been widely adopted in contemporary backup storage systems. It not only saves storage space considerably, but also shortens the data backup time significantly. Since the major goal of the original data deduplication lies in saving storage space, its design has been focused primarily on improving write performance by removing as many duplicate data as possible from incoming data streams. Although fast recovery from a system crash relies mainly on read performance provided by deduplication storage, little investigation into read performance improvement has been made. In general, as the amount of deduplicated data increases, write performance improves accordingly, whereas associated read performance becomes worse. In this paper, we newly propose a deduplication scheme that assures demanded read performance of each data stream while achieving its write performance at a reasonable level, eventually being able to guarantee a target system recovery time. For this, we first propose an indicator called cache aware Chunk Fragmentation Level (CFL) that estimates degraded read performance on the fly by taking into account both incoming chunk information and read cache effects. We also show a strong correlation between this CFL and read performance in the backup datasets. In order to guarantee demanded read performance expressed in terms of a CFL value, we propose a read performance enhancement scheme called selective duplication that is activated whenever the current CFL becomes worse than the demanded one. The key idea is to judiciously write non-unique (shared) chunks into storage together with unique chunks unless the shared chunks exhibit good enough spatial locality. We quantify the spatial locality by using a selective duplication threshold value. Our experiments with the actual backup datasets demonstrate that the proposed scheme achieves demanded read performance in most cases at the reasonable cost of write performance.
机译:重复数据删除技术已在当代备份存储系统中得到广泛采用。它不仅大大节省了存储空间,而且大大缩短了数据备份时间。由于原始重复数据删除的主要目标在于节省存储空间,因此其设计主要集中在通过从传入数据流中删除尽可能多的重复数据来提高写入性能。尽管从系统崩溃中快速恢复主要取决于重复数据删除存储所提供的读取性能,但对读取性能改进的研究很少。通常,随着去重复数据量的增加,写入性能会相应提高,而相关的读取性能会变差。在本文中,我们新提出了一种重复数据删除方案,该方案可确保每个数据流所需的读取性能,同时在合理的水平上实现其写入性能,最终能够保证目标系统的恢复时间。为此,我们首先提出一个称为缓存感知的块碎片级别(CFL)的指标,该指标通过同时考虑传入的块信息和读取缓存效果来估计运行中的读取性能下降。我们还显示了此CFL与备份数据集中的读取性能之间的强相关性。为了保证以CFL值表示的所需读取性能,我们提出了一种称为选择性复制的读取性能增强方案,每当当前CFL变得比所需的性能差时,就会激活该方案。关键思想是明智地将非唯一(共享)块与唯一块一起写入存储,除非共享块展现出足够好的空间局部性。我们通过使用选择性复制阈值来量化空间局部性。我们对实际备份数据集的实验表明,在大多数情况下,以合理的写入性能成本,所提出的方案都可以实现所需的读取性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号