【24h】

Efficient Online Stream Deduplication for Network Block Storage

机译:用于网络块存储的高效在线流重复数据删除

获取原文
获取原文并翻译 | 示例

摘要

Deduplication is an effective technique to optimize storage utilization in data centers and cloud storage systems. It splits data into chunks and then identifies whether chunks are unique or not. Fixed-size chunking (FSC) is widely used in deduplication, which defines the chunk boundary with a fixed interval of bytes. Although it is simple and efficient, FSC may cause boundary shift issue, which usually decreases deduplication rate. Content-defined chunking (CDC) has been proposed to solve this problem. However, there are two challenges to apply CDC in deduplication for network block storage. One challenge is how to establish a mapping scheme between the stream offsets of a deduplicated chunk and its block address; the other challenge is to design an efficient index structure to organize metadata of data chunks on the disk. In this paper, we design two structures to solve the mapping problem and implement two backends to store metadata on network block storage devices, which are based on B+ trees and hash table, respectively. In order to achieve a better search performance on the disk, we reduce the size of the hash table and shrink the lookup range. We evaluate our schemes by real-world workloads. The experimental results show that our schemes have an excellent search performance at an acceptable cost of spatial sacrifice.
机译:重复数据删除是一种优化数据中心和云存储系统中存储利用率的有效技术。它将数据分成多个块,然后识别块是否唯一。固定大小的分块(FSC)在重复数据删除中被广泛使用,它以固定的字节间隔定义了分块边界。尽管FSC简单高效,但可能会引起边界偏移问题,这通常会降低重复数据删除率。已经提出了内容定义的组块(CDC)来解决这个问题。但是,将CDC应用于网络块存储的重复数据删除存在两个挑战。一个挑战是如何在去重块的流偏移与其块地址之间建立映射方案。另一个挑战是设计一种有效的索引结构来组织磁盘上数据块的元数据。在本文中,我们设计了两种结构来解决映射问题,并实现了两个后端,分别将元数据存储在网络块存储设备上,这些后端分别基于B +树和哈希表。为了在磁盘上获得更好的搜索性能,我们减小了哈希表的大小并缩小了查找范围。我们根据实际工作负载评估方案。实验结果表明,我们的方案以可接受的空间牺牲成本具有出色的搜索性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号