首页> 外文期刊>Expert Systems with Application >SBBS: A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection
【24h】

SBBS: A sliding blocking algorithm with backtracking sub-blocks for duplicate data detection

机译:SBBS:具有回溯子块的滑动块算法,用于重复数据检测

获取原文
获取原文并翻译 | 示例
           

摘要

With the explosive growth of data, storage systems are facing huge storage pressure due to a mass of redundant data caused by the duplicate copies or regions of files. Data deduplication is a storage-optimization technique that reduces the data footprint by eliminating multiple copies of redundant data and storing only unique data. The basis of data deduplication is duplicate data detection techniques, which divide files into a number of parts, compare corresponding parts between files via hash techniques and find out redundant data. This paper proposes an efficient sliding blocking algorithm with backtracking sub-blocks called SBBS for duplicate data detection. SBBS improves the duplicate data detection precision of the traditional sliding blocking (SB) algorithm via backtracking the left/right 1/4 and 1/2 sub-blocks in matching-failed segments. Experimental results show that SBBS averagely improves the duplicate detection precision by 6.5% compared with the traditional SB algorithm and by 16.5% compared with content-defined chunking (CDC) algorithm, and it does not increase much extra storage overhead when SBBS divides the files into equal chunks of size 8 kB.
机译:随着数据的爆炸性增长,由于重复的文件副本或文件区域引起的大量冗余数据,存储系统面临巨大的存储压力。重复数据删除是一种存储优化技术,可通过消除冗余数据的多个副本并仅存储唯一数据来减少数据占用量。重复数据删除的基础是重复数据检测技术,该技术将文件划分为多个部分,通过哈希技术比较文件之间的对应部分,并找出冗余数据。本文提出了一种有效的滑动阻塞算法,该算法具有称为SBBS的回溯子块,用于重复数据检测。 SBBS通过回溯匹配失败段中的左/右1/4和1/2子块,提高了传统滑动块(SB)算法的重复数据检测精度。实验结果表明,与传统的SB算法相比,SBBS平均将重复检测精度提高了6.5%,与内容定义分块(CDC)算法相比,平均提高了16.5%,并且SBBS将文件划分为多个文件时,不会增加太多额外的存储开销。相等大小为8 kB的块。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号