首页> 中文期刊>信息网络安全 >基于极值点分块的重复数据检测算法

基于极值点分块的重复数据检测算法

     

摘要

The duplicate data detection technology can significantly reduce the duplication of data in data centers, save network bandwidth, decrease the cost of construction and maintenance. A duplicate data detection algorithm based on Extremum Defined Chunking(EDC) is proposed to overcome the long segment problem of Content Deifned Chunking(CDC) method. The EDC algorithm ifrst calculates all ifngerprints of the sliding windows that their boundary are within the upper and lower limits of data blocks. The last extremum of all ifngerprints is found out, the corresponding end position of the sliding window become the cut-off point of data block. Then the hash value of the data block is calculated to determine whether it is duplicate block. The experimental results show that ECD algorithm, duplicated data detection rate, disk utilization rate is respectively 1.48 times, 1.12 times of CDC algorithm, the effect is signiifcantly notable.%重复数据检测技术能够大幅降低数据中心的存储量,节省网络带宽,减少建设和运维成本。为了克服基于内容分块(CDC)方法容易出现超长块的缺点,文章提出了基于极值点分块(EDC)的重复数据检测算法。EDC算法先计算出所有右边界在数据块上下限范围内的滑动窗口中数据的指纹,找出最后一个指纹极值,所对应的滑动窗口结束位置作为数据块的分界点,再计算该数据块的哈希值并判断是否重复块。实验结果表明,EDC算法的重复数据检测率、磁盘利用率分别是CDC算法的1.48倍和1.12倍,改进效果显著。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号