首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication
【24h】

One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication

机译:一种尺寸并不适合所有情况:备份重复数据删除中分块配置的情况

获取原文
获取外文期刊封面目录资料

摘要

Data backup is regularly required by both enterprise and individual users to protect their data from unexpected loss. There are also various commercial data deduplication systems or software that help users to eliminate duplicates in their backup data to save storage space. In data deduplication systems, the data chunking process splits data into small chunks. Duplicate data is identified by comparing the fingerprints of the chunks. The chunk size setting has significant impact on deduplication performance. A variety of chunking algorithms have been proposed in recent studies. In practice, existing systems often set the chunking configuration in an empirical manner. A chunk size of 4KB or 8KB is regarded as the sweet spot for good deduplication performance. However, the data storage and access patterns of users vary and change along time, as a result, the empirical chunk size setting may not lead to a good deduplication ratio and sometimes results in difficulties of storage capacity planning. Moreover, it is difficult to make changes to the chunking settings once they are put into use as duplicates in data with different chunk size settings cannot be eliminated directly. In this paper, we propose a sampling-based chunking method and develop a tool named SmartChunker to estimate the optimal chunking configuration for deduplication systems. Our evaluations on real-world datasets demonstrate the efficacy and efficiency of SmartChunker.
机译:企业和个人用户都定期需要数据备份,以保护其数据免受意外损失。还有各种商业数据重复数据删除系统或软件可以帮助用户消除备份数据中的重复数据,从而节省存储空间。在重复数据删除系统中,数据分块过程将数据分成小块。通过比较块的指纹来识别重复数据。块大小设置对重复数据删除性能有重大影响。在最近的研究中已经提出了各种分块算法。实际上,现有系统通常以经验方式设置分块配置。 4KB或8KB的块大小被视为实现良好重复数据删除性能的最佳选择。但是,用户的数据存储和访问模式会随时间变化和变化,结果,经验性的块大小设置可能不会导致良好的重复数据删除率,并且有时会导致存储容量规划方面的困难。此外,一旦将分块设置投入使用,就很难对其进行更改,因为无法直接消除具有不同分块大小设置的数据中的重复项。在本文中,我们提出了一种基于采样的分块方法,并开发了一个名为SmartChunker的工具来估计重复数据删除系统的最佳分块配置。我们对真实数据集的评估证明了SmartChunker的功效和效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号