首页> 外文期刊>IEEE Transactions on Computers >DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads
【24h】

DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low Overheads

机译:DARE:重复数据删除感知相似性检测和消除方案,用于降低开销的数据

获取原文
获取原文并翻译 | 示例

摘要

Data reduction has become increasingly important in storage systems due to the explosive growth of digital data in the world that has ushered in the big data era. One of the main challenges facing large-scale data reduction is how to maximally detect and eliminate redundancy at very low overheads. In this paper, we present DARE, a low-overhead deduplication-aware resemblance detection and elimination scheme that effectively exploits existing information for highly efficient resemblance detection in data deduplication based backup/archiving storage systems. The main idea behind DARE is to employ a scheme, call Duplicate-Adjacency based Resemblance Detection (), by considering any two data chunks to be similar (i.e., candidates for delta compression) if their respective adjacent data chunks are duplicate in a deduplication system, and then further enhance the resemblance detection efficiency by an improved super-feature approach. Our experimental results based on real-world and synthetic backup datasets show that DARE only consumes about 1/4 and 1/2 respectively of the computation and indexing overheads required by the traditional super-feature approaches while detecting 2-10 percent more redundancy and achieving a higher throughput, by exploiting existing duplicate-adjacency information for resemblance detection and finding the “sweet spot” for the super-feature approach.
机译:数据缩减在存储系统中已变得越来越重要,这是由于世界上迎来了大数据时代的数字数据的爆炸性增长。大规模数据缩减面临的主要挑战之一是如何以非常低的开销最大程度地检测和消除冗余。在本文中,我们提出了DARE,一种低开销的重复数据消除感知相似性检测和消除方案,该方案可有效利用现有信息,以在基于重复数据消除的备份/归档存储系统中进行高效相似性检测。 DARE背后的主要思想是采用一种方案,即基于重复邻接的相似性检测(),如果它们各自的相邻数据块在重复数据删除系统中重复,则认为任意两个数据块都相似(即,增量压缩的候选对象)。 ,然后通过改进的超特征方法进一步提高相似度检测效率。我们基于实际和综合备份数据集的实验结果表明,DARE仅消耗了传统超功能方法所需的计算和索引开销的分别约1/4和1/2,同时检测了2-10%的冗余并实现了通过利用现有的重复邻接信息进行相似性检测并找到超功能方法的“最佳位置”,可以提高吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号