Coding Schemes for Distributed Storage Systems

机译：分布式存储系统的编码方案

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This thesis is devoted to problems in error-correcting codes motivated by data integrity problems arising in large-scale distributed storage systems. We study properties and constructions of Maximum Distance Separable (MDS) codes, which are widely used in storage applications since they provide the maximum failure tolerance for a given amount of storage overhead.;Among the parameters of the code that are important for storage applications are: the amount of data transferred in the system during node repair (the repair bandwidth), which characterizes the network usage, and the volume of accessed data, which corresponds to the number of disk I/O operations. Therefore, recent research on MDS codes for distributed storage has focused on codes that can minimize these two quantities. A lower bound on the repair bandwidth of a code, called the cut-set bound, was proved by Dimakis et al. in 2010, and codes that attain this bound are said to have the optimal repair property. Explicit optimal-repair low-rate (rate ≤ 1/2) MDS codes were constructed by Rashmi et al. in 2011. At the same time, large-scale distributed systems such as the Google File System and Hadoop Distributed File System, employ high-rate (rate > 1/2) MDS codes due to the need of reducing storage overhead. Until recently, except for some particular cases, no general explicit constructions of high-rate optimal-repair MDS codes were known.;In this thesis, we present the first explicit constructions of optimal-repair MDS codes, thereby providing a solution to the general construction problem of such codes for the high-rate regime.;More specifically, we construct explicit MDS codes that can repair any number of failed nodes from any number of helper nodes with the smallest possible amount of downloaded/accessed data. For the particular case of repairing a single node failure, we further present an explicit family of MDS codes that minimize the amount of accessed data during the repair. This family of codes has an additional favorable property that the node size (the amount of information stored in the node) is also the smallest possible. Reducing the node size directly translates into reducing the complexity of storage systems.;While most studies on MDS codes with optimal repair bandwidth focus on array codes, the repair problem of widely used scalar codes such as Reed-Solomon codes has also recently attracted attention of researchers. It has been an open problem whether scalar linear MDS codes can achieve the cut-set bound. In this thesis, we answer this question in the affirmative by giving explicit constructions of Reed-Solomon codes that can be repaired at the cut-set bound. We also prove a lower bound on the node size of optimally repairable scalar MDS codes, showing that the node size of our RS codes is close to the best possible for scalar linear codes.;Finally, we extend the concept of repair bandwidth from erasure correction to error correction, which forms a new problem in coding theory. We prove a bound on the amount of downloaded information for this problem and present explicit code families that attain this bound for a wide range of parameters.

机译：本文致力于解决由大型分布式存储系统中出现的数据完整性问题引起的纠错代码中的问题。我们研究了最大距离可分离（MDS）代码的属性和构造，由于它们在给定的存储开销量下提供了最大的容错能力，因此在存储应用中得到了广泛的应用;其中对存储应用重要的代码参数包括：表示节点修复期间系统中传输的数据量（修复带宽），它表示网络使用情况，而访问的数据量则与磁盘I / O操作数相对应。因此，最近对用于分布式存储的MDS代码的研究集中在可以最小化这两个数量的代码上。 Dimakis等人证明了代码修复带宽的下限，称为割集限制。在2010年，据说达到此界限的代码具有最佳的修复性能。 Rashmi等人构造了显式的最优修复低速率（速率≤1/2）MDS代码。在2011年。同时，由于需要减少存储开销，因此诸如Google File System和Hadoop Distributed File System之类的大规模分布式系统采用高速率（比率> 1/2）的MDS代码。直到最近，除了一些特殊情况外，还没有关于高速率最优修复MDS代码的一般显式构造。；本文中，我们提出了最优修复MDS代码的第一个显式构造，从而为通用解决方案提供了一种解决方案。尤其是，我们构造了显式的MDS代码，该代码可以使用尽可能少的下载/访问数据量，从任何数量的辅助节点中修复任意数量的故障节点。对于修复单节点故障的特殊情况，我们进一步介绍了MDS代码的显式族，该类MDS代码可最大程度地减少修复期间访问的数据量。该代码家族具有一个额外的有利属性，即节点大小（存储在节点中的信息量）也尽可能最小。减小节点大小直接转化为降低存储系统的复杂性。虽然大多数对具有最佳修复带宽的MDS代码的研究都集中在阵列代码上，但最近广泛使用的标量代码（例如Reed-Solomon代码）的修复问题也引起了人们的关注。研究人员。标量线性MDS代码是否可以实现割集边界一直是一个悬而未决的问题。在本文中，我们通过给出明确的Reed-Solomon码构造来肯定地回答这个问题，该码可以在割集边界处修复。我们还证明了最优可修复标量MDS代码的节点大小的下界，这表明我们的RS代码的节点大小已接近标量线性代码的最佳可能值。;最后，我们从擦除校正扩展了修复带宽的概念纠错，这在编码理论中形成了一个新问题。我们证明了此问题的下载信息量受到限制，并提出了可用于多种参数的明确代码族。

著录项

作者
Ye, Min.;
展开▼
作者单位

University of Maryland, College Park.;

展开▼
授予单位 University of Maryland, College Park.;
学科 Electrical engineering.
学位 Ph.D.
年度 2017
页码 105 p.
总页数 105
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Towards a delivery scheme for speedup of data backup in distributed storage systems using erasure codes [J] . You Pengfei, Huang Zhen, Peng Yuxing, Journal of supercomputing . 2019,第1期

机译：迈向使用擦除代码加快分布式存储系统中数据备份速度的交付方案
2. TERS: a traffic efficient repair scheme for repairing multiple losses in erasure-coded distributed storage systems [J] . LiMing Zheng, Xuan Wang, XiaoBo Tian, International Journal of Computational Science and Engineering . 2018,第3期

机译：TERS：用于修复擦除编码分布式存储系统中多损耗的流量有效修复方案
3. I/O-Efficient Scaling Schemes for Distributed Storage Systems with CRS Codes [J] . Si Wu, Yinlong Xu, Yongkun Li, IEEE Transactions on Parallel and Distributed Systems . 2016,第9期

机译：具有CRS代码的分布式存储系统的I / O有效扩展方案
4. Latency Analysis of Flexible Redundant Scheme in MDS-Coded Distributed Storage Systems [C] . Qiqi Shuai, Victor O. K. Li IEEE Global Communications Conference . 2017

机译：MDS编码的分布式存储系统中灵活冗余方案的延迟分析
5. Design of Exact Repair Regenerating Codes for Distributed Storage Systems [D] . Elyasi, Mehran. 2021

机译：分布式存储系统精确修复再生码设计
6. A Soft–Hard Combination Decision Fusion Scheme for a Clustered Distributed Detection System with Multiple Sensors [O] . Junhai Luo, Xiaoting He 2018

机译：具有多个传感器的集群分布式检测系统的软硬组合决策融合方案
7. Coding Schemes for Distributed Storage Systems: Implementation and Improvements [O] . Saman Tabatabaeian, Rajendra P. Lal, Wilson Naik 2014

机译：分布式存储系统的编码方案：实施和改进

Coding Schemes for Distributed Storage Systems

摘要

著录项

相似文献

相关主题

期刊订阅