首页> 外文期刊>Computing and informatics >Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying
【24h】

Data De-Duplication with Adaptive Chunking and Accelerated Modification Identifying

机译:具有自适应分块和加速修改识别的重复数据删除

获取原文
           

摘要

The data de-duplication system not only pursues the high de-duplication rate, which refers to the aggregate reduction in storage requirements gained from de-duplication, but also the de-duplication speed. To solve the problem of random parameter-setting brought by Content Defined Chunking (CDC), a self-adaptive data chunking algorithm is proposed. The algorithm improves the de-duplication rate by conducting pre-processing de-duplication to the samples of the classified files and then selecting the appropriate algorithm parameters. Meanwhile, FastCDC, a kind of content-based fast data chunking algorithm, is adopted to solve the problem of low de-duplication speed of CDC. By introducing de-duplication factor and acceleration factor, FastCDC can significantly boost de-duplication speed while not sacrificing the de-duplication rate through adjusting these two parameters. The experimental results demonstrate that our proposed method can improve the de-duplication rate by about 5 %, while FastCDC can obtain the increase of de-duplication speed by 50 % to 200 % only at the expense of less than 3 % de-duplication rate loss.
机译:数据重复数据删除系统不仅追求高重复数据删除率,这意味着从重复数据删除中获得的存储需求的总体减少,而且也意味着重复数据删除的速度。针对内容定义块(CDC)带来的随机参数设置问题,提出了一种自适应数据分块算法。通过对分类文件的样本进行预处理重复数据删除,然后选择适当的算法参数,该算法提高了重复数据删除率。同时,FastCDC是一种基于内容的快速数据分块算法,用于解决CDC的重复数据删除速度低的问题。通过引入重复数据删除因子和加速因子,FastCDC可以通过调整这两个参数来显着提高重复数据删除速度,同时又不牺牲重复数据删除率。实验结果表明,本文提出的方法可以将重复数据删除率提高约5%,而FastCDC只需将重复数据删除率降低到3%以下即可将重复数据删除速度提高50%至200%。失利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号