An Efficient Indexing Mechanism for Data Deduplication

机译：高效的重复数据删除索引机制

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

At present, there is a vast amount of duplicated data or redundant data in storage systems. Data de-duplication can eliminate multiple copies of the same file and duplicated segments or chunks of data within those files. In these days, therefore, data de-duplication becomes an interesting field in storage environments especially in persistent data storage for data centers. Many data deduplication mechanisms have been proposed for efficient data deduplication in order to safe storage space. Current issue for data deduplication is to avoid full-chunk indexing to identify the incoming data is new, which is time consuming process. In this paper, we propose an efficient indexing mechanism for this problem using the advantage of B+ tree properties. In our proposed system, we will first separate the file into variable-length chunks using Two Thresholds Two Divisors chunking algorithm. ChunkIDs are then obtained by applying hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree like index structure. So the searching time for the duplicate file chunks reduces from O (n) to O (log n), which can avoid the risk of full chunk indexing.

机译：当前，存储系统中存在大量重复数据或冗余数据。重复数据删除可以消除同一文件的多个副本以及这些文件中重复的数据段或数据块。因此，如今，重复数据删除已成为存储环境中一个有趣的领域，尤其是在用于数据中心的持久数据存储中。为了安全的存储空间，已经提出了许多重复数据删除机制来进行有效的重复数据删除。重复数据删除的当前问题是避免全块索引来识别传入的数据是新数据，这是耗时的过程。在本文中，我们利用B +树属性的优点，针对此问题提出了一种有效的索引机制。在我们提出的系统中，我们将首先使用“两个阈值两个除数”分块算法将文件分成可变长度的块。然后，通过将哈希函数应用于块来获取ChunkID。生成的ChunkID被用作索引结构之类的B +树中的索引键。因此，重复文件块的搜索时间从O（n）减少到O（log n），这可以避免全块索引的风险。

著录项

来源
《Current Trends in Information Technology (CTIT), 2009》|2009年|p.1-5|共5页
会议地点
作者
Thwel T.T.; Thein N.L.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理技术;
关键词
b+ tree; data deduplication; indexing;

机译：b +树;重复数据删除;索引;

相似文献

外文文献
中文文献
专利

1. Differential Evolution based bucket indexed data deduplication for big data storage [J] . Kumar Naresh, Antwal Shobha, Jain S. C. Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2018,第1期

机译：基于差分演进的桶索引数据重复数据删除，用于大数据存储
2. Efficient indexing techniques for record matching and deduplication [J] . Anjali Goel, Rajesh Prasad International journal of computational vision and robotics . 2014,第1a2期

机译：用于记录匹配和重复数据删除的高效索引技术
3. Resemblance and mergence based indexing for high performance data deduplication [J] . Panfeng Zhang, Ping Huang, Xubin He, The Journal of Systems and Software . 2017,第JUNa期

机译：基于相似度和合并的索引，可实现高性能重复数据删除
4. An Efficient Indexing Mechanism for Data Deduplication [C] . Thwel T.T., Thein N.L. Current Trends in Information Technology (CTIT), 2009 . 2009

机译：高效的重复数据删除索引机制
5. An Efficient Data Deduplication Design with Flash-Memory Based Solid State Drive. [D] . Lu, Guanlin. 2012

机译：使用基于闪存的固态驱动器进行高效的重复数据删除设计。
6. An Efficient Approach for Web Indexing of Big Data through Hyperlinks in Web Crawling [O] . R. Suganya Devi, D. Manjula, R. K. Siddharth 2015

机译：通过Web爬网中的超链接对大数据进行Web索引的一种有效方法
7. BloomStore: Bloom-Filter based Memory-efficient Key-Value Store for Indexing of Data Deduplication on Flash [O] . Guanlin Lu, Young Jin Nam, David H. C. Du 2013

机译：BloomStore：基于Bloom-Filter的内存高效键值存储，用于索引Flash上的重复数据删除

An Efficient Indexing Mechanism for Data Deduplication

摘要

著录项

相似文献

相关主题

期刊订阅