首页> 外文期刊>Concurrency and computation: practice and experience >Repairing multiple failures adaptively with erasure codes in distributed storage systems
【24h】

Repairing multiple failures adaptively with erasure codes in distributed storage systems

机译:在分布式存储系统中使用擦除码自适应地修复多个故障

获取原文
获取原文并翻译 | 示例

摘要

Repairs of multiple failures in distributed storage systems have posed the challenges for erasure coding: how to minimize the repair time with the least extra repair network traffic cost. However, existing repair schemes designed for single failure suffer from the high network traffic cost due to the serial repairs for multiple failures. Repair schemes designed for multiple failures suffer from long repair time due to the centralized repair structure. In this paper, we propose a decentralized adaptive repair scheme, called DARS, to minimize the repair time with the least extra network traffic cost. Specially, we propose a three-layer repair model to support the repairs for both the single and multiple failures. For low repair time, a bandwidth-aware node selection technique is proposed to guide the selection of nodes, and a line-structured data transmission technique is proposed to organize the data transmission between the providers and the newcomer. For the least extra network traffic cost, a core-based data distribution technique is proposed to organize the data transmission between the coordinator and other newcomers, and an intersection provider adjustment technique is proposed to adaptively adjust the number of intersection providers. Moreover, we adopt the ‘lazy repair’ within a stripe to further reduce the repair network traffic cost. We implement and evaluate DARS on our raid distributed storage system under various parameter settings with 30 physical machines and 200 virtual machines. Experimental results confirm that DARS reduces the repair time by 29% and 55% on average compared with tree-structured repair and CORE, respectively. Copyright © 2015 John Wiley & Sons, Ltd.
机译:分布式存储系统中多个故障的修复给擦除编码带来了挑战:如何以最少的修复网络流量成本来最小化修复时间。然而,由于针对多个故障的串行修复,针对单个故障而设计的现有修复方案遭受了高网络流量成本的困扰。由于集中式维修结构,针对多种故障而设计的维修方案需要较长的维修时间。在本文中,我们提出了一种称为DARS的分散式自适应修复方案,以最少的额外网络流量成本最小化修复时间。特别是,我们提出了一个三层修复模型,以支持对单个和多个故障的修复。为了缩短维修时间,提出了一种带宽感知的节点选择技术来指导节点的选择,并提出了一种行结构化的数据传输技术来组织提供商与新来者之间的数据传输。为了使网络流量成本最少,提出了一种基于核心的数据分发技术来组织协调器与其他新来者之间的数据传输,并提出了路口提供者调整技术来自适应地调整路口提供者的数量。此外,我们在条带内采用“延迟修复”,以进一步降低修复网络的流量成本。我们在30个物理机和200个虚拟机的各种参数设置下,在RAID分布式存储系统上实施和评估DARS。实验结果证实,与树形修复和CORE相比,DARS分别平均减少了29%和55%的修复时间。版权所有©2015 John Wiley&Sons,Ltd.

著录项

  • 来源
  • 作者单位

    National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

    National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

    National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

    National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing College of Computer Changsha Hunan China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号