One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication

机译：一种尺寸并不适合所有情况：备份重复数据删除中分块配置的情况

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data backup is regularly required by both enterprise and individual users to protect their data from unexpected loss. There are also various commercial data deduplication systems or software that help users to eliminate duplicates in their backup data to save storage space. In data deduplication systems, the data chunking process splits data into small chunks. Duplicate data is identified by comparing the fingerprints of the chunks. The chunk size setting has significant impact on deduplication performance. A variety of chunking algorithms have been proposed in recent studies. In practice, existing systems often set the chunking configuration in an empirical manner. A chunk size of 4KB or 8KB is regarded as the sweet spot for good deduplication performance. However, the data storage and access patterns of users vary and change along time, as a result, the empirical chunk size setting may not lead to a good deduplication ratio and sometimes results in difficulties of storage capacity planning. Moreover, it is difficult to make changes to the chunking settings once they are put into use as duplicates in data with different chunk size settings cannot be eliminated directly. In this paper, we propose a sampling-based chunking method and develop a tool named SmartChunker to estimate the optimal chunking configuration for deduplication systems. Our evaluations on real-world datasets demonstrate the efficacy and efficiency of SmartChunker.

机译：企业和个人用户都定期需要数据备份，以保护其数据免受意外损失。还有各种商业数据重复数据删除系统或软件可以帮助用户消除备份数据中的重复数据，从而节省存储空间。在重复数据删除系统中，数据分块过程将数据分成小块。通过比较块的指纹来识别重复数据。块大小设置对重复数据删除性能有重大影响。在最近的研究中已经提出了各种分块算法。实际上，现有系统通常以经验方式设置分块配置。 4KB或8KB的块大小被视为实现良好重复数据删除性能的最佳选择。但是，用户的数据存储和访问模式会随时间变化和变化，结果，经验性的块大小设置可能不会导致良好的重复数据删除率，并且有时会导致存储容量规划方面的困难。此外，一旦将分块设置投入使用，就很难对其进行更改，因为无法直接消除具有不同分块大小设置的数据中的重复项。在本文中，我们提出了一种基于采样的分块方法，并开发了一个名为SmartChunker的工具来估计重复数据删除系统的最佳分块配置。我们对真实数据集的评估证明了SmartChunker的功效和效率。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2018年|213-222|共10页
会议地点
作者
Huijun Wu; Chen Wang; Kai Lu; Yinjin Fu; Liming Zhu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Metadata; Tools; Linear programming; Software; Large-scale systems; Estimation error;

机译：元数据;工具;线性编程;软件;大型系统;估计误差;

相似文献

外文文献
中文文献
专利

1. A Fast Asymmetric Extremum Content Defined Chunking Algorithm for Data Deduplication in Backup Storage Systems [J] . Yucheng Zhang, Dan Feng, Hong Jiang, IEEE Transactions on Computers . 2017,第2期

机译：用于备份存储系统中重复数据删除的快速非对称极值内容定义分块算法
2. Dynamic determination of variable sizes of chunks in a deduplication system [J] . Hirsch Michael, Klein Shmuel T., Shapira Dana, Discrete Applied Mathematics . 2020,第1期

机译：重复数据删除系统中块的可变大小的动态确定
3. One size does not fit all: Strategy configurations, complex environments,and new venture performance in emerging economies [J] . Du Yunzhou, Kim Phillip H. Journal of Business Research . 2021,第Jana期

机译：一种尺寸不适合所有：战略配置，复杂的环境和新兴经济体的新风险性能
4. One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication [C] . Huijun Wu, Chen Wang, Kai Lu, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2018

机译：一种尺寸不适合所有：备份重复数据删除中的块配置的情况
5. Efficient and secure deduplication for cloud-based backups. [D] . Wang, Yufeng. 2015

机译：针对基于云的备份的高效，安全的重复数据删除。
6. DOMe: A deduplication optimization method for the NewSQL database backups [O] . Longxiang Wang, Zhengdong Zhu, Xingjun Zhang, -1

机译：DOMe：NewSQL数据库备份的重复数据删除优化方法
7. Extreme binning: Scalable, parallel deduplication for chunk-based file backup [O] . Deepavali Bhagwat, Kave Eshghi, Darrell D. E. Long, 2009

机译：极端分箱：基于块的文件备份的可扩展，并行重复数据删除

One Size Does Not Fit All: The Case for Chunking Configuration in Backup Deduplication

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅