【24h】

Design of an exact data deduplication cluster

机译:精确的重复数据删除集群的设计

获取原文
获取原文并翻译 | 示例

摘要

Data deduplication is an important component of enterprise storage environments. The throughput and capacity limitations of single node solutions have led to the development of clustered deduplication systems. Most implemented clustered inline solutions are trading deduplication ratio versus performance and are willing to miss opportunities to detect redundant data, which a single node system would detect. We present an inline deduplication cluster with a joint distributed chunk index, which is able to detect as much redundancy as a single node solution. The use of locality and load balancing paradigms enables the nodes to minimize information exchange. Therefore, we are able to show that, despite different claims in previous papers, it is possible to combine exact deduplication, small chunk sizes, and scalability within one environment using only a commodity GBit Ethernet interconnect. Additionally, we investigate the throughput and scalability limitations with a special focus on the intra-node communication.
机译:重复数据删除是企业存储环境的重要组成部分。单节点解决方案的吞吐量和容量限制导致了集群重复数据删除系统的发展。大多数已实施的集群内联解决方案都考虑了重复数据删除率与性能之间的关系,并且愿意错过机会来检测单节点系统可以检测到的冗余数据。我们提出了一种具有联合分布式组块索引的内联重复数据删除集群,该集群能够检测与单节点解决方案一样多的冗余。局部性和负载平衡范例的使用使节点能够最小化信息交换。因此,我们能够证明,尽管前几篇论文有不同的主张,但仅使用商用GBit以太网互连,就可以在一个环境中组合精确的重复数据删除,小块大小和可伸缩性。此外,我们研究吞吐量和可伸缩性限制,并特别关注节点内通信。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号