Repairing multiple failures adaptively with erasure codes in distributed storage systems

Pei Xiaoqiang; Wang Yijie; Ma Xingkong; Xu Fangliang

首页> 外文期刊>Concurrency and computation: practice and experience >Repairing multiple failures adaptively with erasure codes in distributed storage systems

【24h】

Repairing multiple failures adaptively with erasure codes in distributed storage systems

机译：在分布式存储系统中使用擦除码自适应地修复多个故障

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Repairs of multiple failures in distributed storage systems have posed the challenges for erasure coding: how to minimize the repair time with the least extra repair network traffic cost. However, existing repair schemes designed for single failure suffer from the high network traffic cost due to the serial repairs for multiple failures. Repair schemes designed for multiple failures suffer from long repair time due to the centralized repair structure. In this paper, we propose a decentralized adaptive repair scheme, called DARS, to minimize the repair time with the least extra network traffic cost. Specially, we propose a three-layer repair model to support the repairs for both the single and multiple failures. For low repair time, a bandwidth-aware node selection technique is proposed to guide the selection of nodes, and a line-structured data transmission technique is proposed to organize the data transmission between the providers and the newcomer. For the least extra network traffic cost, a core-based data distribution technique is proposed to organize the data transmission between the coordinator and other newcomers, and an intersection provider adjustment technique is proposed to adaptively adjust the number of intersection providers. Moreover, we adopt the ‘lazy repair’ within a stripe to further reduce the repair network traffic cost. We implement and evaluate DARS on our raid distributed storage system under various parameter settings with 30 physical machines and 200 virtual machines. Experimental results confirm that DARS reduces the repair time by 29% and 55% on average compared with tree-structured repair and CORE, respectively. Copyright © 2015 John Wiley & Sons, Ltd.

机译：分布式存储系统中多个故障的修复给擦除编码带来了挑战：如何以最少的修复网络流量成本来最小化修复时间。然而，由于针对多个故障的串行修复，针对单个故障而设计的现有修复方案遭受了高网络流量成本的困扰。由于集中式维修结构，针对多种故障而设计的维修方案需要较长的维修时间。在本文中，我们提出了一种称为DARS的分散式自适应修复方案，以最少的额外网络流量成本最小化修复时间。特别是，我们提出了一个三层修复模型，以支持对单个和多个故障的修复。为了缩短维修时间，提出了一种带宽感知的节点选择技术来指导节点的选择，并提出了一种行结构化的数据传输技术来组织提供商与新来者之间的数据传输。为了使网络流量成本最少，提出了一种基于核心的数据分发技术来组织协调器与其他新来者之间的数据传输，并提出了路口提供者调整技术来自适应地调整路口提供者的数量。此外，我们在条带内采用“延迟修复”，以进一步降低修复网络的流量成本。我们在30个物理机和200个虚拟机的各种参数设置下，在RAID分布式存储系统上实施和评估DARS。实验结果证实，与树形修复和CORE相比，DARS分别平均减少了29％和55％的修复时间。版权所有©2015 John Wiley＆Sons，Ltd.

著录项

来源
《Concurrency and computation: practice and experience》 |2016年第5期|1437-1461|共25页
作者
Pei Xiaoqiang; Wang Yijie; Ma Xingkong; Xu Fangliang;
展开▼
作者单位

National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Repair Tree: Fast Repair for Single Failure in Erasure-Coded Distributed Storage Systems [J] . Huayu Zhang, Hui Li, Shuo-Yen Robert Li IEEE Transactions on Parallel and Distributed Systems . 2017,第6期

机译：维修树：快速维修，适用于采用纠删码的分布式存储系统中的单个故障
2. TERS: a traffic efficient repair scheme for repairing multiple losses in erasure-coded distributed storage systems [J] . LiMing Zheng, Xuan Wang, XiaoBo Tian, International Journal of Computational Science and Engineering . 2018,第3期

机译：TERS：用于修复擦除编码分布式存储系统中多损耗的流量有效修复方案
3. Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems [J] . Jun Li, Baochun Li IEEE Transactions on Parallel and Distributed Systems . 2017,第5期

机译：Beehive：用于修复分布式存储系统中多个故障的擦除代码
4. Fast Repair for Single Failure in Erasure Coding-Based Distributed Storage Systems [C] . Zhang Huayu, Li Hui, Zhu Bing, 2014 IEEE 33rd International Symposium on Reliable Distributed Systems . 2014

机译：基于纠删码的分布式存储系统中的单个故障的快速修复
5. Erasure Codes for Optimal Node Repairs in Distributed Storage Systems. [D] . Goparaju, Sreechakra. 2014

机译：分布式存储系统中用于最佳节点修复的擦除代码。
6. NOREC4DNA: using near-optimal rateless erasure codes for DNA storage [O] . Peter Michael Schwarz, Bernd Freisleben 2021

机译：NOREC4DNA：使用用于DNA储存的近乎最佳的无数擦除码
7. Cooperative Repair of Multiple Node Failures in Distributed Storage Systems [O] . Shum, Kenneth W., Chen, Junyu 2016

机译：分布式存储中多节点故障的协同修复系统

Repairing multiple failures adaptively with erasure codes in distributed storage systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅