首页> 外文期刊>The Journal of Systems and Software >Resemblance and mergence based indexing for high performance data deduplication
【24h】

Resemblance and mergence based indexing for high performance data deduplication

机译:基于相似度和合并的索引,可实现高性能重复数据删除

获取原文
获取原文并翻译 | 示例

摘要

Data deduplication, a data redundancy elimination technique, has been widely employed in many application environments to reduce data storage space. However, it is challenging to provide a fast and scalable key-value fingerprint index particularly for large datasets, while the index performance is critical to the overall deduplication performance. This paper proposes RMD, a resemblance and mergence based deduplication scheme, which aims to provide quick responses to fingerprint queries. The key idea of RMD is to leverage a bloom filter array and a data resemblance algorithm to dramatically reduce the query range. At data ingesting time, RMD uses a resemblance algorithm to detect resemble data segments and put resemblance segments in the same bin. As a result, at querying time, it only needs to search in the corresponding bin to detect duplicate content, which significantly speeds up the query process. Moreover, RMD uses a mergence strategy to accumulate resemblance segments to relevant bins, and exploits frequency-based fingerprint retention policy to cap the bin capacity to improve query throughput and data deduplication ratio. Extensive experimental results with real-world datasets have shown that RMD is able to achieve high query performance and outperforms several well-known deduplication schemes.
机译:重复数据删除是一种消除数据冗余的技术,已在许多应用程序环境中广泛采用以减少数据存储空间。但是,为大型数据集提供快速且可扩展的键值指纹索引是一项挑战,而索引性能对于整体重复数据删除性能至关重要。本文提出了一种RMD,一种基于相似度和合并的重复数据删除方案,旨在为指纹查询提供快速响应。 RMD的关键思想是利用布隆过滤器阵列和数据相似性算法来大幅减少查询范围。在数据摄取时,RMD使用相似算法来检测相似数据段并将相似段放入同一容器中。结果,在查询时,只需要在相应的bin中搜索以检测重复的内容,就可以大大加快查询过程。此外,RMD使用合并策略将相似性段累积到相关的bin,并利用基于频率的指纹保留策略来限制bin的容量,以提高查询吞吐量和重复数据删除率。真实数据集的大量实验结果表明,RMD可以实现较高的查询性能,并且胜过几种著名的重复数据删除方案。

著录项

  • 来源
    《The Journal of Systems and Software》 |2017年第6期|11-24|共14页
  • 作者单位

    School of Computer, Huazhong University of Science and Technology, Wuhan, China,Wuhan National Laboratory for Optoelectronics, Wuhan, China;

    Department of Computer and Information Sciences, Temple University, USA;

    Department of Computer and Information Sciences, Temple University, USA,Department of Electrical and Computer Engineering, Virginia Commonwealth University, USA;

    School of Computer, Huazhong University of Science and Technology, Wuhan, China,Wuhan National Laboratory for Optoelectronics, Wuhan, China;

    School of Computer, Huazhong University of Science and Technology, Wuhan, China,Wuhan National Laboratory for Optoelectronics, Wuhan, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    fast index; deduplication; resemblance mergence; fingerprint retrieval; key value index;

    机译:快速索引;重复数据删除;相似度合并;指纹检索;键值索引;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号