首页> 外文期刊>Computers, IEEE Transactions on >Similarity and Locality Based Indexing for High Performance Data Deduplication
【24h】

Similarity and Locality Based Indexing for High Performance Data Deduplication

机译:基于相似度和局部性的高性能重复数据删除索引

获取原文
获取原文并翻译 | 示例

摘要

Data deduplication has gained increasing attention and popularity as a space-efficient approach in backup storage systems. One of the main challenges for centralized data deduplication is the scalability of fingerprint-index search. In this paper, we propose SiLo, a near-exact and scalable deduplication system that effectively and complementarily exploits similarity and locality of data streams to achieve high duplicate elimination, throughput, and well balanced load at extremely low RAM overhead. The main idea behind SiLo is to expose and exploit more similarity by grouping strongly correlated small files into a segment and segmenting large files, and to leverage the locality in the data stream by grouping contiguous segments into blocks to capture similar and duplicate data missed by the probabilistic similarity detection. SiLo also employs a locality based stateless routing algorithm to parallelize and distribute data blocks to multiple backup nodes. By judiciously enhancing similarity through the exploitation of locality and vice versa, SiLo is able to significantly reduce RAM usage for index-lookup, achieve the near-exact efficiency of duplicate elimination, maintain a high deduplication throughput, and obtain load balance among backup nodes.
机译:作为备份存储系统中一种节省空间的方法,重复数据删除技术已引起越来越多的关注和普及。集中式重复数据删除的主要挑战之一是指纹索引搜索的可伸缩性。在本文中,我们提出了SiLo,这是一种近乎精确且可扩展的重复数据删除系统,该系统有效且互补地利用数据流的相似性和局部性,以极低的RAM开销实现高重复数据消除,吞吐量和均衡负载。 SiLo背后的主要思想是通过将高度相关的小文件分组为一个片段并将大文件分段来公开和利用更多相似性,并通过将连续的片段分组为多个块以捕获数据丢失的相似和重复数据来利用数据流中的局部性。概率相似度检测。 SiLo还采用基于位置的无状态路由算法来并行化数据块并将其分配到多个备份节点。通过利用局部性来明智地增强相似性,反之亦然,SiLo能够显着减少用于索引查找的RAM使用量,实现几乎完全相同的重复消除效率,保持较高的重复数据删除吞吐量,并在备份节点之间实现负载平衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号