Similarity based deduplication with small data chunks

Aronovich L.; Asher R.; Harnik D.; Hirsch M.; Klein S. T.; Toaff Y.

首页> 外文期刊>Discrete Applied Mathematics >Similarity based deduplication with small data chunks

【24h】

Similarity based deduplication with small data chunks

机译：具有小数据块的基于相似性的重复数据删除

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large backup and restore systems may have a petabyte or more data in their repository. Such systems are often compressed by means of deduplication techniques, that partition the input text into chunks and store recurring chunks only once. One of the approaches is to use hashing methods to store fingerprints for each data chunk, detecting identical chunks with very low probability for collisions. As alternative, it has been suggested to use similarity instead of identity based searches, which allows the definition of much larger chunks. This implies that the data structure needed to store the fingerprints is much smaller, so that such a system may be more scalable than systems built on the first approach.

机译：大型备份和还原系统在其存储库中可能有PB或更多数据。这样的系统通常通过重复数据删除技术进行压缩，该技术将输入文本划分为多个块并仅将重复的块存储一次。一种方法是使用哈希方法存储每个数据块的指纹，以极低的冲突概率检测相同的块。作为替代方案，已经建议使用相似性而不是基于身份的搜索，这样可以定义更大的块。这意味着存储指纹所需的数据结构要小得多，因此，这种系统可能比基于第一种方法构建的系统更具可伸缩性。

著录项

来源
《Discrete Applied Mathematics》 |2016年第null期|共13页
作者
Aronovich L.; Asher R.; Harnik D.; Hirsch M.; Klein S. T.; Toaff Y.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类离散数学;
关键词
Deduplication; Similarity; Small data chunks; Approximate hashing;

机译：重复数据删除;相似性;小数据块;近似哈希;

相似文献

外文文献
中文文献
专利

1. Similarity based deduplication with small data chunks [J] . Aronovich L., Asher R., Harnik D., Discrete Applied Mathematics . 2016,第Null期

机译：具有小数据块的基于相似性的重复数据删除
2. Accelerating content-defined-chunking based data deduplication by exploiting parallelism [J] . Wen Xia, Dan Feng, Hong Jiang, Future generation computer systems . 2019,第Sepa期

机译：通过利用并行性来加速基于内容定义的分块的重复数据删除
3. A scalable data chunk similarity based compression approach for efficient big sensing data processing on cloud [J] . P. Jouvelot Computing reviews . 2017,第10期

机译：基于可伸缩数据块相似度的压缩方法，可在云上高效地进行大传感数据处理
4. Boosting the Profitability of NVRAM-based Storage Devices via the Concept of Dual-Chunking Data Deduplication [C] . Shuo-Han Chen, Yu-Pei Liang, Yuan-Hao Chang, Asia and South Pacific Design Automation Conference . 2020

机译：通过双数据块重复数据删除概念提高基于NVRAM的存储设备的盈利能力
5. An Efficient Data Deduplication Design with Flash-Memory Based Solid State Drive. [D] . Lu, Guanlin. 2012

机译：使用基于闪存的固态驱动器进行高效的重复数据删除设计。
6. Rule-based deduplication of article records from bibliographic databases [O] . Yu Jiang, Can Lin, Weiyi Meng, 2014

机译：从书目数据库对文章记录进行基于规则的重复数据删除
7. Similarity Based Deduplication with Small Data Chunks [O] . Lior Aronovich, Ron Asher, Danny Harnik, 2015

机译：基于相似度的小数据块重复数据删除

Similarity based deduplication with small data chunks

摘要

著录项

相似文献

相关主题

期刊订阅