首页> 外文会议>Current Trends in Information Technology (CTIT), 2009 >An Efficient Indexing Mechanism for Data Deduplication
【24h】

An Efficient Indexing Mechanism for Data Deduplication

机译:高效的重复数据删除索引机制

获取原文

摘要

At present, there is a vast amount of duplicated data or redundant data in storage systems. Data de-duplication can eliminate multiple copies of the same file and duplicated segments or chunks of data within those files. In these days, therefore, data de-duplication becomes an interesting field in storage environments especially in persistent data storage for data centers. Many data deduplication mechanisms have been proposed for efficient data deduplication in order to safe storage space. Current issue for data deduplication is to avoid full-chunk indexing to identify the incoming data is new, which is time consuming process. In this paper, we propose an efficient indexing mechanism for this problem using the advantage of B+ tree properties. In our proposed system, we will first separate the file into variable-length chunks using Two Thresholds Two Divisors chunking algorithm. ChunkIDs are then obtained by applying hash function to the chunks. The resulted ChunkIDs are used to build as indexing keys in B+ tree like index structure. So the searching time for the duplicate file chunks reduces from O (n) to O (log n), which can avoid the risk of full chunk indexing.
机译:当前,存储系统中存在大量重复数据或冗余数据。重复数据删除可以消除同一文件的多个副本以及这些文件中重复的数据段或数据块。因此,如今,重复数据删除已成为存储环境中一个有趣的领域,尤其是在用于数据中心的持久数据存储中。为了安全的存储空间,已经提出了许多重复数据删除机制来进行有效的重复数据删除。重复数据删除的当前问题是避免全块索引来识别传入的数据是新数据,这是耗时的过程。在本文中,我们利用B +树属性的优点,针对此问题提出了一种有效的索引机制。在我们提出的系统中,我们将首先使用“两个阈值两个除数”分块算法将文件分成可变长度的块。然后,通过将哈希函数应用于块来获取ChunkID。生成的ChunkID被用作索引结构之类的B +树中的索引键。因此,重复文件块的搜索时间从O(n)减少到O(log n),这可以避免全块索引的风险。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号