Efficient Online Stream Deduplication for Network Block Storage

机译：用于网络块存储的高效在线流重复数据删除

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deduplication is an effective technique to optimize storage utilization in data centers and cloud storage systems. It splits data into chunks and then identifies whether chunks are unique or not. Fixed-size chunking (FSC) is widely used in deduplication, which defines the chunk boundary with a fixed interval of bytes. Although it is simple and efficient, FSC may cause boundary shift issue, which usually decreases deduplication rate. Content-defined chunking (CDC) has been proposed to solve this problem. However, there are two challenges to apply CDC in deduplication for network block storage. One challenge is how to establish a mapping scheme between the stream offsets of a deduplicated chunk and its block address; the other challenge is to design an efficient index structure to organize metadata of data chunks on the disk. In this paper, we design two structures to solve the mapping problem and implement two backends to store metadata on network block storage devices, which are based on B+ trees and hash table, respectively. In order to achieve a better search performance on the disk, we reduce the size of the hash table and shrink the lookup range. We evaluate our schemes by real-world workloads. The experimental results show that our schemes have an excellent search performance at an acceptable cost of spatial sacrifice.

机译：重复数据删除是一种优化数据中心和云存储系统中存储利用率的有效技术。它将数据分成多个块，然后识别块是否唯一。固定大小的分块（FSC）在重复数据删除中被广泛使用，它以固定的字节间隔定义了分块边界。尽管FSC简单高效，但可能会引起边界偏移问题，这通常会降低重复数据删除率。已经提出了内容定义的组块（CDC）来解决这个问题。但是，将CDC应用于网络块存储的重复数据删除存在两个挑战。一个挑战是如何在去重块的流偏移与其块地址之间建立映射方案。另一个挑战是设计一种有效的索引结构来组织磁盘上数据块的元数据。在本文中，我们设计了两种结构来解决映射问题，并实现了两个后端，分别将元数据存储在网络块存储设备上，这些后端分别基于B +树和哈希表。为了在磁盘上获得更好的搜索性能，我们减小了哈希表的大小并缩小了查找范围。我们根据实际工作负载评估方案。实验结果表明，我们的方案以可接受的空间牺牲成本具有出色的搜索性能。

著录项

来源
《2018 IEEE Intl Conf on Parallel amp; Distributed Processing with Applications, Ubiquitous Computing amp; Communications, Big Data amp; Cloud Computing, Social Computing amp; Networking, Sustainable Computing amp; Communications》|2018年|111-119|共9页
会议地点 Melbourne(AU)
作者
Hongli Lu; Guangping Xu; Bo Tang; Shengli Li; Mian Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Power capacitors; Metadata; Indexes; Redundancy; Servers; Cloud computing; Fingerprint recognition;

机译：功率电容器;元数据;索引;冗余;服务器;云计算;指纹识别;;

相似文献

外文文献
中文文献
专利

1. Updatable block-level deduplication of encrypted data with efficient auditing in cloud storage [J] . Dang Qianlong, Xie Ying, Li Donghao, 中国邮电高校学报（英文版） . 2019,第003期

机译：云存储中的有效审计可更新的加密数据块级重复数据删除
2. An Efficient Randomized Algorithm for Rumor Blocking in Online Social Networks [J] . Tong Guangmo, Wu Weili, Guo Ling, Network Science and Engineering, IEEE Transactions on . 2020,第2期

机译：在线社交网络中的谣言阻塞有效的随机算法
3. QuickPoint: Efficiently Identifying Densest Sub-Graphs in Online Social Networks for Event Stream Dissemination [J] . IEEE Transactions on Knowledge and Data Engineering . 2020,第2期

机译：QuickPoint：有效识别在线社交网络中最密集的子图以进行事件流分发
4. Efficient Online Stream Deduplication for Network Block Storage [C] . Hongli Lu, Guangping Xu, Bo Tang, IEEE Intl Conf on Ubiquitous Computing amp;amp;amp;amp;amp;amp; Communications . 2018

机译：高效的网络块存储重复数据删除
5. Network Coding for Sensor Networks, Distributed Storage and Video Streaming. [D] . Nguyen, Kien Trung. 2010

机译：传感器网络，分布式存储和视频流的网络编码。
6. Efficient Processing of Spatio-Temporal Data Streams With Spiking Neural Networks [O] . Alexander Kugele, Thomas Pfeil, Michael Pfeiffer, 2020

机译：用尖刺神经网络有效地处理时空数据流
7. An Efficient Randomized Algorithm for Rumor Blocking in Online Social Networks [O] . Tong, Guangmo, Wu, Weili, Guo, Ling, 2017

机译：一种有效的在线社交网络中的谣言阻塞随机算法网络

Efficient Online Stream Deduplication for Network Block Storage

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅