首页> 外文会议>31st International Conference on Distributed Computing Systems >Fused Data Structures for Handling Multiple Faults in Distributed Systems
【24h】

Fused Data Structures for Handling Multiple Faults in Distributed Systems

机译:用于处理分布式系统中多个故障的融合数据结构

获取原文

摘要

The paper describes a technique to correct crash faults in large data structures hosted on distributed servers, based on the concept of fused backups. The prevalent solution to this problem is replication. To correct f crash faults among n distinct data structures, replication requires nf additional replicas. If each of the primaries contains O(m) nodes of O(s) size each, this translates to O(nmsf) total backup space. Our technique uses a combination of erasure correcting codes and selective replication to correct f crash faults using just f additional backups consuming O(msf) total backup space, while incurring minimal overhead during normal operation. Since the data is maintained in the coded form, recovery is costly as compared to replication. However, in a system with infrequent faults, the savings in space outweighs the cost of recovery. We explore the theory and algorithms for these fused backups and provide a library of such backups for all the data structures in the Java 6 Collection framework. Our experimental evaluation confirms that fused backups are space-efficient as compared to replication (almost n times), while they cause very little overhead for updates. Many real world distributed systems such as Amazon's Dynamo data store use replication to achieve reliability. An alternate, fusion-based design can result in significant savings in space as well as other resources such as power.
机译:本文介绍了一种基于融合备份的概念来纠正托管在分布式服务器上的大型数据结构中的崩溃故障的技术。解决此问题的普遍方法是复制。为了更正n个不同数据结构之间的f个崩溃故障,复制需要nf个附加副本。如果每个主数据库都包含每个O(s)大小的O(m)个节点,则这将转换为O(nmsf)个总备份空间。我们的技术结合了纠错码和选择性复制功能,仅使用f个占用O(msf)总备份空间的额外备份来纠正f个崩溃故障,同时在正常操作期间产生的开销最小。由于数据以编码形式维护,因此与复制相比,恢复成本高昂。但是,在故障很少的系统中,节省的空间超过了恢复的成本。我们探索了这些融合备份的理论和算法,并为Java 6 Collection框架中的所有数据结构提供了此类备份的库。我们的实验评估证实,与复制相比,融合备份具有空间效率(几乎是n倍),而它们的更新开销却很小。许多现实世界的分布式系统(例如Amazon的Dynamo数据存储)都使用复制来实现可靠性。另一种基于融合的设计可以显着节省空间以及其他资源(例如电源)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号