首页> 外文会议>International Conference on Computing, Networking and Communications >DS-Dedupe: A scalable, low network overhead data routing algorithm for inline cluster deduplication system
【24h】

DS-Dedupe: A scalable, low network overhead data routing algorithm for inline cluster deduplication system

机译:DS-Dedupe:用于嵌入式集群重复数据删除系统的可扩展的,低网络开销的数据路由算法

获取原文

摘要

Inline cluster deduplication technique has been widely used in data centers to improve storage efficiency. Data routing algorithm has a crucial impact on the deduplication factor, throughput and scalability in a cluster deduplication system. In this paper, we propose a stateful data routing algorithm called DS-Dedupe. To make full use of similarity in data streams, DS-Dedupe builds up a super-chunk granularity similarity index in each client to trace the super-chunks that have been routed. Then we calculate a similarity coefficient according to the index to determine whether a new super-chunk should be assigned directly or by a consistent hash, thus strike a sensible tradeoff between deduplication factor and network overhead. Our experiments on two datasets demonstrate that DS-Dedupe achieves a high elimination ratio at a low communication overhead. Besides, as data routing is operated by client node, metadata server bottleneck can be avoided.
机译:串联群集重复数据删除技术已在数据中心广泛使用,以提高存储效率。数据路由算法对集群重复数据删除系统中的重复数据删除因子,吞吐量和可伸缩性具有至关重要的影响。在本文中,我们提出了一种称为DS-Dedupe的有状态数据路由算法。为了充分利用数据流中的相似性,DS-Dedupe在每个客户端中建立了一个超块粒度相似性索引,以跟踪已路由的超级块。然后,我们根据该索引计算相似性系数,以确定是应直接分配新的超级块还是通过一致的哈希来分配新的超级块,从而在重复数据删除因子和网络开销之间做出明智的权衡。我们在两个数据集上的实验表明,DS-Dedupe以较低的通信开销实现了较高的消除率。此外,由于数据路由是由客户端节点进行的,因此可以避免元数据服务器的瓶颈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号