首页> 外文学位 >Efficient Erasure Coding in Distributed Storage Systems
【24h】

Efficient Erasure Coding in Distributed Storage Systems

机译:分布式存储系统中的有效擦除编码

获取原文
获取原文并翻译 | 示例

摘要

Distributed storage systems store a substantial amount of data on many commodity servers. As servers failures are common, it is critical for distributed storage systems to store redundancy to tolerate such failures. Conventionally, a distributed storage system replicates data as the redundancy. Recently, erasure coding has been increasingly replacing replication thanks to its lower storage overhead. However, in many scenarios, erasure coding incurs additional overhead, such as higher network traffic, or lowers the performance of data accesses. In this dissertation, we address some of such challenges in two broad areas.;Erasure coding with the optimal network overhead. Traditional erasure codes incur high network overhead when data needs to be reconstructed after a server failure. We study the problem of constructing erasure codes that consume the optimal network traffic to reconstruct data from multiple failures. We start from a new construction of minimum-storage cooperative regenerating (MSCR) codes that reconstruct data from two failures with the optimal network traffic. We show that an existing minimum-storage regenerating (MSR) code is also an MSCR code for two failures, and vice versa. For more general cases, we propose Beehive codes that optimize the volume of network traffic to reconstruct data from more than two failures, with storage overhead only slightly higher than optimum.;I/O efficient erasure coding and systems. Traditionally erasure coding incurs higher I/O overhead because of its encoding and decoding operations. In this dissertation, we propose solutions to minimize the overhead of writing and reading erasure-coded data. On the input side, we design and implement Mist, a new mechanism for disseminating erasure-coded data efficiently to multiple receiving servers in data centers. On the output side, we exploit the demand skewness in distributed storage systems and propose Zebra, a framework that encodes data into multiple tiers dynamically by their demand to reduce the overall overhead to read erasure-coded data. We also investigate the data parallelism of erasure coding, which may affect the performance of running parallel data processing jobs on the erasure-coded data, such as MapReduce, and construct Carousel codes that allow data parallelism to be expanded into an arbitrary number.
机译:分布式存储系统在许多商用服务器上存储大量数据。由于服务器故障很常见,因此分布式存储系统存储冗余以容忍此类故障至关重要。按照惯例,分布式存储系统将数据复制为冗余。最近,由于其较低的存储开销,擦除编码已越来越多地取代复制。但是,在许多情况下,擦除编码会导致额外的开销,例如更高的网络流量,或降低数据访问的性能。本文在两个广泛的领域解决了这些挑战。具有最优网络开销的擦除编码。当服务器故障后需要重建数据时,传统的擦除代码会导致较高的网络开销。我们研究了构造擦除代码的问题,该擦除代码消耗了最佳的网络流量以从多个故障中重建数据。我们从最小存储协作再生(MSCR)代码的新结构开始,该代码可通过两次故障以最佳网络流量重建数据。我们表明,现有的最小存储再生(MSR)代码也是两个故障的MSCR代码,反之亦然。对于更一般的情况,我们建议使用Beehive代码优化网络流量,以从两个以上的故障中重建数据,而存储开销仅比最佳情况略高。I / O有效的擦除编码和系统。传统上,擦除编码由于其编码和解码操作而导致较高的I / O开销。在本文中,我们提出了解决方案,以尽量减少写入和读取擦除编码数据的开销。在输入方面,我们设计并实现了Mist,这是一种用于将擦除编码的数据有效地分发到数据中心中多个接收服务器的新机制。在输出方面,我们利用分布式存储系统中的需求偏斜,提出了Zebra,Zebra是一个框架,该框架根据需求动态将数据编码为多层,以减少读取擦除编码数据的总开销。我们还研究了擦除编码的数据并行性,这可能会影响在擦除编码的数据(如MapReduce)上运行并行数据处理作业的性能,并构造允许数据并行性扩展为任意数量的Carousel代码。

著录项

  • 作者

    Li, Jun.;

  • 作者单位

    University of Toronto (Canada).;

  • 授予单位 University of Toronto (Canada).;
  • 学科 Computer engineering.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 195 p.
  • 总页数 195
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号