首页> 外文期刊>IEEE Transactions on Computers >A Modeling Framework for Reliability of Erasure Codes in SSD Arrays
【24h】

A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

机译:SSD阵列擦除代码可靠性建模框架

获取原文
获取原文并翻译 | 示例

摘要

Emergence of Solid-State Drives (SSDs) have evolved the data storage industry where they are rapidly replacing Hard Disk Drives (HDDs) due to their superiority in performance and power. Meanwhile, SSDs have reliability issues due to bit errors, bad blocks, and bad chips. To help reliability, Redundant Array of Independent Disks (RAID) configurations, originally proposed to increase both performance and reliability of HDDs, are also applied to SSD arrays. However, the conventional reliability models of HDD RAID cannot be intactly applied to SSD arrays, as the nature of failures in SSDs are totally different from HDDs. Previous studies on the reliability of SSD arrays are based on the deprecated SSD failure data, and only focus on limited failure types, device failures, and page failures caused by the bit errors, while recent field studies have reported other failure types including bad blocks and bad chips, and a high correlation between failures. In this paper, we investigate the reliability of SSD arrays using field storage traces and real-system implementation of conventional and emerging erasure codes. The reliability is evaluated by statistical fault injection experiments that post-process the usage logs obtained from the real-system implementation, while the fault/failure attributes are obtained from the state-of-the-art field data by previous works. As a case study, we examine conventional RAID5 and RAID6 and emerging Partial-MDS (PMDS) codes, Sector-Disk (SD) codes, and STAIR codes in terms of both reliability and performance using an open-source software RAID controller, MD (in Linux kernel version 3.10.0-327), and arrays of Samsung 850 Pro SSDs. Our detailed analysis on the data loss breakdown shows that a) emerging erasure codes fail to replace RAID6 in terms of reliability, b) row-wise erasure codes are the most efficient choices for contemporary SSD devices, and c) previous models overestimate the SSD array reliability by up to six orders of magnitude, as they just focus on the coincidence of bad pages (bit errors) and bad chips within a data stripe that holds the minority of root cause of data loss in SSD arrays. Our experiments show that the combination of bad chips with bad blocks is recognized as the major source of data loss in RAID5 and emerging codes (contributing more than 54 and 90 percent of data loss in RAID5 and emerging codes, respectively), while RAID6 remains robust under these failure combinations. Finally, the fault injection results reveal that SSD array reliability, as well as the failure breakdown is significantly correlated with SSD type.
机译:固态驱动器的出现(SSD)已经发展了数据存储行业,在那里它们迅速更换硬盘驱动器(HDDS)由于其性能和功率的优势。与此同时,SSD由于位错误,坏块和坏芯片而具有可靠性问题。为了帮助可靠性,冗余的独立磁盘阵列(RAID)配置,最初提出了提高HDD的性能和可靠性,也应用于SSD阵列。然而,HDD RAID的传统可靠性模型不能完全应用于SSD阵列,因为SSD中的故障的性质与HDD完全不同。以前关于SSD阵列的可靠性的研究基于已弃用的SSD故障数据,并且仅关注由位错误引起的有限故障类型,设备故障和页面故障,而最近的现场研究报告了包括坏块的其他故障类型坏筹码和失败之间的高相关性。在本文中,我们研究了使用现场存储​​迹线和现实系统实现的SSD阵列的可靠性和传统的擦除码。通过统计故障注入实验评估可靠性,该实验后处理从实际系统实现中获得的使用日志,而故障/失败属性由先前的作品从最先进的字段数据获得。作为一个案例研究,我们使用开源软件RAID控制器MD的可靠性和性能来检查传统的RAID5和RAID6和新出现的部分MDS(PMDS)代码,扇区 - 磁盘(SD)代码和STAIR代码(在Linux内核版本3.10.0-327中,以及Samsung 850 Pro SSD的数组。我们对数据丢失故障的详细分析表明,A)在可靠性方面无法替换RAID6,B)行明智的擦除代码是当代SSD设备的最有效选择,以及C)以前的模型高估SSD阵列可靠性多达六个数量级,因为它们只关注数据条带内的错误页面(比特错误)和坏芯片的巧合,该数据条带有SSD阵列中数据丢失的少数根本原因。我们的实验表明,具有糟糕块的坏芯片的组合被认为是RAID5和新兴代码中数据丢失的主要来源(贡献超过54和90%的RAID5和新兴代码的数据丢失),而RAID6仍然坚固根据这些失败组合。最后,故障注射结果表明,SSD阵列可靠性以及故障细分与SSD型显着相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号