首页> 外文期刊>Computer networks >Dynamic single node failure recovery in distributed storage systems
【24h】

Dynamic single node failure recovery in distributed storage systems

机译:分布式存储系统中的动态单节点故障恢复

获取原文
获取原文并翻译 | 示例

摘要

With the emergence of many erasure coding techniques that help provide reliability in practical distributed storage systems, we use fractional repetition coding on the given data and optimize the allocation of data blocks on system nodes in a way that minimizes the system repair cost. We selected fractional repetition coding due to its simple repair mechanism that minimizes the repair and disk access bandwidths together with the property of un-coded repair process. To minimize the system repair cost, we formulate our problem using incidence matrices and solve it heuristically using genetic algorithms for all possible cases of single node failures. We then address three practical extensions that respectively account for newly arriving blocks, newly arriving nodes and variable priority files. A re-optimization mechanism for the storage allocation matrix is proposed for the first two extensions that can be easily implemented in real time without the need to redistribute original on-node blocks. The third extension is addressed by implementing variable fractional repetition codes which is shown to achieve significant cost reduction. The contributions of the paper are four fold: (i) generating an optimized block distribution scheme among the nodes of a given data center for fixed and variable size blocks; (ii) optimization of storage allocation under dynamic environments with data block arrivals; (iii) optimization of storage allocation with newly added storage nodes; and (iv) generating an effective block distribution scheme among the nodes by accounting for varying priorities among data blocks. We present a wide range of results for the various proposed algorithms and considered scenarios to quantify the achievable performance gains. (C) 2016 Elsevier B.V. All rights reserved.
机译:随着许多擦除编码技术的出现,有助于在实际的分布式存储系统中提供可靠性,我们对给定的数据使用分数重复编码,并以最小化系统修复成本的方式优化了系统节点上数据块的分配。我们选择小数重复编码是因为其简单的修复机制将修复和磁盘访问带宽降到最低,并具有未编码的修复过程的特性。为了最大程度地减少系统维修成本,我们使用入射矩阵来表示问题,并针对所有可能的单节点故障情况使用遗传算法试探性地解决问题。然后,我们讨论三个实用的扩展,它们分别说明了新到达的块,新到达的节点和可变优先级文件。针对前两个扩展,提出了一种用于存储分配矩阵的重新优化机制,该扩展可以轻松实时地实现,而无需重新分配原始的节点上块。第三扩展是通过实现可变的分数重复码来解决的,该码被证明可以显着降低成本。本文的贡献有四个方面:(i)在给定数据中心的节点之间为固定和可变大小的块生成优化的块分配方案; (ii)在有数据块到达的动态环境下优化存储分配; (iii)使用新添加的存储节点优化存储分配; (iv)通过考虑数据块之间的不同优先级来在节点之间产生有效的块分配方案。我们为各种提出的算法和经过考虑的方案提供了广泛的结果,以量化可实现的性能提升。 (C)2016 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号