Efficient Erasure Coding in Distributed Storage Systems

机译：分布式存储系统中的有效擦除编码

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Distributed storage systems store a substantial amount of data on many commodity servers. As servers failures are common, it is critical for distributed storage systems to store redundancy to tolerate such failures. Conventionally, a distributed storage system replicates data as the redundancy. Recently, erasure coding has been increasingly replacing replication thanks to its lower storage overhead. However, in many scenarios, erasure coding incurs additional overhead, such as higher network traffic, or lowers the performance of data accesses. In this dissertation, we address some of such challenges in two broad areas.;Erasure coding with the optimal network overhead. Traditional erasure codes incur high network overhead when data needs to be reconstructed after a server failure. We study the problem of constructing erasure codes that consume the optimal network traffic to reconstruct data from multiple failures. We start from a new construction of minimum-storage cooperative regenerating (MSCR) codes that reconstruct data from two failures with the optimal network traffic. We show that an existing minimum-storage regenerating (MSR) code is also an MSCR code for two failures, and vice versa. For more general cases, we propose Beehive codes that optimize the volume of network traffic to reconstruct data from more than two failures, with storage overhead only slightly higher than optimum.;I/O efficient erasure coding and systems. Traditionally erasure coding incurs higher I/O overhead because of its encoding and decoding operations. In this dissertation, we propose solutions to minimize the overhead of writing and reading erasure-coded data. On the input side, we design and implement Mist, a new mechanism for disseminating erasure-coded data efficiently to multiple receiving servers in data centers. On the output side, we exploit the demand skewness in distributed storage systems and propose Zebra, a framework that encodes data into multiple tiers dynamically by their demand to reduce the overall overhead to read erasure-coded data. We also investigate the data parallelism of erasure coding, which may affect the performance of running parallel data processing jobs on the erasure-coded data, such as MapReduce, and construct Carousel codes that allow data parallelism to be expanded into an arbitrary number.

机译：分布式存储系统在许多商用服务器上存储大量数据。由于服务器故障很常见，因此分布式存储系统存储冗余以容忍此类故障至关重要。按照惯例，分布式存储系统将数据复制为冗余。最近，由于其较低的存储开销，擦除编码已越来越多地取代复制。但是，在许多情况下，擦除编码会导致额外的开销，例如更高的网络流量，或降低数据访问的性能。本文在两个广泛的领域解决了这些挑战。具有最优网络开销的擦除编码。当服务器故障后需要重建数据时，传统的擦除代码会导致较高的网络开销。我们研究了构造擦除代码的问题，该擦除代码消耗了最佳的网络流量以从多个故障中重建数据。我们从最小存储协作再生（MSCR）代码的新结构开始，该代码可通过两次故障以最佳网络流量重建数据。我们表明，现有的最小存储再生（MSR）代码也是两个故障的MSCR代码，反之亦然。对于更一般的情况，我们建议使用Beehive代码优化网络流量，以从两个以上的故障中重建数据，而存储开销仅比最佳情况略高。I / O有效的擦除编码和系统。传统上，擦除编码由于其编码和解码操作而导致较高的I / O开销。在本文中，我们提出了解决方案，以尽量减少写入和读取擦除编码数据的开销。在输入方面，我们设计并实现了Mist，这是一种用于将擦除编码的数据有效地分发到数据中心中多个接收服务器的新机制。在输出方面，我们利用分布式存储系统中的需求偏斜，提出了Zebra，Zebra是一个框架，该框架根据需求动态将数据编码为多层，以减少读取擦除编码数据的总开销。我们还研究了擦除编码的数据并行性，这可能会影响在擦除编码的数据（如MapReduce）上运行并行数据处理作业的性能，并构造允许数据并行性扩展为任意数量的Carousel代码。

著录项

作者
Li, Jun.;
展开▼
作者单位

University of Toronto (Canada).;

展开▼
授予单位 University of Toronto (Canada).;
学科 Computer engineering.
学位 Ph.D.
年度 2017
页码 195 p.
总页数 195
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Efficiently Coding Replicas to Erasure Coded Blocks in Distributed Storage Systems [J] . Zimu Yuan, Huiying Liu IEEE communications letters . 2017 ,第9期

机译：在分布式存储系统中高效编码副本到擦除编码块
2. TERS: a traffic efficient repair scheme for repairing multiple losses in erasure-coded distributed storage systems [J] . LiMing Zheng, Xuan Wang, XiaoBo Tian, International Journal of Computational Science and Engineering . 2018 ,第3期

机译：TERS：用于修复擦除编码分布式存储系统中多损耗的流量有效修复方案
3. LAR: Locality-Aware Reconstruction for erasure-coded distributed storage systems [J] . Fangliang Xu, YijieWang, Xiaoqiang Pei, Concurrency and computation: practice and experience . 2019 ,第11期

机译：LAR：用于擦除编码的分布式存储系统的位置感知重构
4. D3: Deterministic Data Distribution for Efficient Data Reconstruction in Erasure-Coded Distributed Storage Systems [C] . Zhipeng Li, Min Lv, Yinlong Xu, IEEE International Parallel and Distributed Processing Symposium . 2019

机译：D3：确定性数据分发，以在擦除编码分布式存储系统中进行有效的数据重构
5. Erasure Codes for Optimal Node Repairs in Distributed Storage Systems. [D] . Goparaju, Sreechakra. 2014

机译：分布式存储系统中用于最佳节点修复的擦除代码。
6. NOREC4DNA: using near-optimal rateless erasure codes for DNA storage [O] . Peter Michael Schwarz, Bernd Freisleben 2021

机译：NOREC4DNA：使用用于DNA储存的近乎最佳的无数擦除码
7. Using Erasure Codes Efficiently for Storage in a Distributed System [O] . Marcos K. Aguilera, Ramaprabhu Janakiraman 2005

机译：在分布式系统中有效地使用Erasure代码进行存储

Efficient Erasure Coding in Distributed Storage Systems

摘要

著录项

相似文献

相关主题

期刊订阅