首页> 外文会议>IEEE Intl Conf on Parallel Distributed Processing with Applications >CSF: An Efficient Parallel Deduplication Algorithm by Clustering Scattered Fingerprints
【24h】

CSF: An Efficient Parallel Deduplication Algorithm by Clustering Scattered Fingerprints

机译:CSF:通过聚类散射指纹的有效并行重复数据删除算法

获取原文

摘要

Deduplication is one of the most effective and efficient techniques to save memory space. It is widely used in data centers and cloud storage systems. Multi-stream concurrency is expected to increase the throughput of deduplication. However, multiple data streams hurt the locality of accessed data and weaken the benefit of data concurrency, which forms a challenge for data deduplication. Usually, the ordered index can reshape the locality of data streams, which can improve the cache hit rate during deduplication. In this paper, we first propose an efficient parallel deduplication algorithm by clustering scattered fingerprints, called CSF, to exploit the data locality as much as possible. It tries to improve the utilization rate of the fingerprint page by the clustered fingerprints. Moreover, it retains the scattered fingerprint to next round fingerprint comparison by re-using the fingerprints on the same page. Thus the number of the fingerprint pages to read is reduced. We further optimize the proposed algorithm by a scheduling strategy, which effectively schedules the task of part streams ahead while ensuring the overall performance. Finally, we evaluated the performance of our algorithm with various data sets in experiments. The experimental results show that our proposed algorithm achieves better performance than the state-of-the-art method.
机译:重复数据删除是节省内存空间最有效和最有效的技术之一。它广泛用于数据中心和云存储系统。预计多流并发会增加重复数据删除的吞吐量。然而,多个数据流损害所访问数据的局部性并削弱数据并发的益处,这构成了数据重复数据删除的挑战。通常,有序索引可以重新展开数据流的局部性,这可以在重复数据删除期间提高缓存命中率。在本文中,我们首先通过聚类散射指纹,称为CSF的散射指纹提出了有效的并行重复数据删除算法,以尽可能地利用数据局部性。它试图通过聚类指纹提高指纹页面的利用率。此外,它通过在同一页面上重新使用指纹来保持散射指纹与下一个圆形指纹比较。因此,减少了要读取的指纹页面的数量。我们通过调度策略进一步优化了所提出的算法,其有效地调度部分流前方的任务,同时确保整体性能。最后,我们评估了我们在实验中的各种数据集的算法的性能。实验结果表明,我们所提出的算法比最先进的方法实现了更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号