首页> 外文会议>IEEE International Conference on Parallel and Distributed Systems >A similarity clustering-based deduplication strategy in cloud storage systems
【24h】

A similarity clustering-based deduplication strategy in cloud storage systems

机译:云存储系统中基于相似性聚类的重复数据删除策略

获取原文

摘要

Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods.
机译:重复数据删除是一种数据冗余消除技术,旨在通过减少云存储系统中的冗余数据来节省系统存储资源。随着云计算技术的发展,重复数据删除越来越多地应用于云数据中心。然而,传统技术在大数据重复数据删除方面面临巨大挑战,以适当地权衡重复数据删除吞吐量的两个冲突目标和高重复消除比率。本文提出了一种基于相似性聚类的重复数据删除策略(名为SCDS),其旨在删除更重复的数据,而不会显着增加系统开销。 SCDS的主要思想是通过数据分区和相似性聚类算法缩小指纹索引的查询范围。在数据预处理阶段,SCDS使用数据分区算法将类似的数据分类在一起。在数据删除阶段,相似性聚类算法用于将类似的数据指纹超块划分为同一群集。在同一群集中检测到重复指纹,以加速重复指纹的检索。实验表明,SCD的重复数据删除比优于一些现有的相似重复数据删除算法,但开销仅略高于一些高吞吐量,但重复数据删除比例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号