CACH-Dedup: Content Aware Clustered and Hierarchical Deduplication

机译：CACH-Dedup：内容感知群集和分层重复数据删除

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Distributed deduplication overcomes, to some extent, index-lookup disk bottleneck problem by dividing deduplication tasks among many nodes. However, the task of selecting these nodes is an important challenge because it could result in high communication cost and the storage node island effect problem. Moreover, intelligent data routing is required to exploit the peculiar nature of data from different applications which share insignificant amount of content. In this paper, we explore CACH-Dedup, a content aware clustered and hierarchical deduplication system, which exploits the negligibly small amount of content shared among chunks from different file types to create groups of files and storage nodes with out loss of deduplication effectiveness. It uses hierarchical deduplication to reduce the size of fingerprint indexes at the global level, where only files and big sized segments are deduplicated. It also makes advantage of locality first using the big sized segments deduplicated at the global level and second by routing a set of consecutive files together to one storage node. Furthermore, it exploits similarity by making use of similarity bloom filters of streams for stateful routing which results in duplicate elimination rate in a par with single node deduplication with a minimal cost of computation and communication. CACH-Dedup is evaluated using a prototype deployed on windows server environment distributed over four separate machines. It is shown to have duplicate elimination effectiveness in a par with a single node deduplication system, with a minimal communication overhead and an acceptable deduplication throughput.

机译：通过在多个节点之间分配重复数据删除任务，分布式重复数据删除在某种程度上克服了索引查找磁盘瓶颈问题。但是，选择这些节点的任务是一个重要的挑战，因为它可能导致较高的通信成本和存储节点孤岛效应问题。而且，需要智能数据路由来利用来自共享少量内容的不同应用程序的数据的特殊性质。在本文中，我们探索了CACH-Dedup，这是一个内容感知的群集和分层重复数据删除系统，它利用不同文件类型的块之间共享的少量内容来创建文件和存储节点组，而不会造成重复数据删除效果的损失。它使用分层重复数据删除来减少全局级别的指纹索引的大小，其中仅对文件和大型段进行重复数据删除。它还首先利用局部性的优势，即使用在全局级别上进行了重复数据删除的大型段，然后通过将一组连续的文件一起路由到一个存储节点来利用本地性。此外，它通过利用流的相似性布隆过滤器进行状态路由来利用相似性，这导致与单节点重复数据删除相当的重复消除率，而计算和通信的成本却最低。 CACH-Dedup使用部署在分布于四台单独计算机上的Windows服务器环境上的原型进行评估。与单节点重复数据删除系统相比，它具有消除重复的有效性，并具有最小的通信开销和可接受的重复数据删除吞吐量。

著录项

来源
《IEEE International Conference on Parallel and Distributed Systems》|2018年|399-407|共9页
会议地点
作者
Girum Dagnaw; Wang Hua; Ke Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Conferences;

机译：会议活动;

相似文献

外文文献
中文文献
专利

1. CA-Dedupe: content-aware deduplication in SSDs [J] . Gholami Taghizadeh Ramin, Gholami Taghizadeh Reza, Khakpash Fahimeh, Journal of supercomputing . 2020,第11期

机译：CA-DEDUPE：SSD中的内容感知重复数据删除
2. Delay-aware content distribution via cell clustering and content placement for multiple tenants [J] . Sun Guolin, Ayepah-Mensah Daniel, Lu Li, Journal of network and computer applications . 2019,第Jula期

机译：延迟感知内容分发通过单元格群集和多个租户的内容放置
3. Amplifier-Aware Content-Based Precoder Design for Hierarchical Image Transmission over a Realistic MIMO-OFDM Channel [J] . Hermann Sohtsinda, Perrine Clency, Bachir Smail, Journal of circuits, systems and computers . 2019,第12期

机译：基于放大器的基于内容的预编码器设计，用于在实际MIMO-OFDM信道上进行分层图像传输
4. CACH-Dedup: Content Aware Clustered and Hierarchical Deduplication [C] . Girum Dagnaw, Wang Hua, Ke Zhou IEEE International Conference on Parallel and Distributed Systems . 2018

机译：CACH-DEDUP：内容感知群集和分层重复数据删除
5. Pseudo-hierarchical ant-based clustering using a heterogeneous agent hierarchy and automatic boundary formation. [D] . Brown, Jeremy Bernard. 2009

机译：使用异构代理层次结构和自动边界形成的基于伪层次蚂蚁的聚类。
6. In Vitro Cytotoxic Activity against Breast Cervical and Ovarian Cancer Cells and Flavonoid Content of Plant Ingredients Used in a Selected Thai Traditional Cancer Remedy: Correlation and Hierarchical Cluster Analysis [O] . Thammarat Tuy-on, Arunporn Itharat, Ponlawat Maki, 2020

机译：针对乳腺癌宫颈癌和卵巢癌细胞的体外细胞毒性活性和植物成分的类黄酮含量用于选定的泰式传统癌症补救措施：相关性和分层集群分析
7. Shrinker: Improving Live Migration of Virtual Clusters over WANs with Distributed Data Deduplication and Content-Based Addressing [O] . Riteau, Pierre, Morin, Christine, Priol, Thierry 2011

机译：收缩器：利用分布式重复数据删除和基于内容的寻址，改善虚拟集群在WAN上的实时迁移

CACH-Dedup: Content Aware Clustered and Hierarchical Deduplication

摘要

著录项

相似文献

相关主题

期刊订阅