A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

Kishani Mostafa; Ahmadian Saba; Asadi Hossein

首页> 外文期刊>IEEE Transactions on Computers >A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

【24h】

A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

机译：SSD阵列擦除代码可靠性建模框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Emergence of Solid-State Drives (SSDs) have evolved the data storage industry where they are rapidly replacing Hard Disk Drives (HDDs) due to their superiority in performance and power. Meanwhile, SSDs have reliability issues due to bit errors, bad blocks, and bad chips. To help reliability, Redundant Array of Independent Disks (RAID) configurations, originally proposed to increase both performance and reliability of HDDs, are also applied to SSD arrays. However, the conventional reliability models of HDD RAID cannot be intactly applied to SSD arrays, as the nature of failures in SSDs are totally different from HDDs. Previous studies on the reliability of SSD arrays are based on the deprecated SSD failure data, and only focus on limited failure types, device failures, and page failures caused by the bit errors, while recent field studies have reported other failure types including bad blocks and bad chips, and a high correlation between failures. In this paper, we investigate the reliability of SSD arrays using field storage traces and real-system implementation of conventional and emerging erasure codes. The reliability is evaluated by statistical fault injection experiments that post-process the usage logs obtained from the real-system implementation, while the fault/failure attributes are obtained from the state-of-the-art field data by previous works. As a case study, we examine conventional RAID5 and RAID6 and emerging Partial-MDS (PMDS) codes, Sector-Disk (SD) codes, and STAIR codes in terms of both reliability and performance using an open-source software RAID controller, MD (in Linux kernel version 3.10.0-327), and arrays of Samsung 850 Pro SSDs. Our detailed analysis on the data loss breakdown shows that a) emerging erasure codes fail to replace RAID6 in terms of reliability, b) row-wise erasure codes are the most efficient choices for contemporary SSD devices, and c) previous models overestimate the SSD array reliability by up to six orders of magnitude, as they just focus on the coincidence of bad pages (bit errors) and bad chips within a data stripe that holds the minority of root cause of data loss in SSD arrays. Our experiments show that the combination of bad chips with bad blocks is recognized as the major source of data loss in RAID5 and emerging codes (contributing more than 54 and 90 percent of data loss in RAID5 and emerging codes, respectively), while RAID6 remains robust under these failure combinations. Finally, the fault injection results reveal that SSD array reliability, as well as the failure breakdown is significantly correlated with SSD type.

机译：固态驱动器的出现（SSD）已经发展了数据存储行业，在那里它们迅速更换硬盘驱动器（HDDS）由于其性能和功率的优势。与此同时，SSD由于位错误，坏块和坏芯片而具有可靠性问题。为了帮助可靠性，冗余的独立磁盘阵列（RAID）配置，最初提出了提高HDD的性能和可靠性，也应用于SSD阵列。然而，HDD RAID的传统可靠性模型不能完全应用于SSD阵列，因为SSD中的故障的性质与HDD完全不同。以前关于SSD阵列的可靠性的研究基于已弃用的SSD故障数据，并且仅关注由位错误引起的有限故障类型，设备故障和页面故障，而最近的现场研究报告了包括坏块的其他故障类型坏筹码和失败之间的高相关性。在本文中，我们研究了使用现场存储迹线和现实系统实现的SSD阵列的可靠性和传统的擦除码。通过统计故障注入实验评估可靠性，该实验后处理从实际系统实现中获得的使用日志，而故障/失败属性由先前的作品从最先进的字段数据获得。作为一个案例研究，我们使用开源软件RAID控制器MD的可靠性和性能来检查传统的RAID5和RAID6和新出现的部分MDS（PMDS）代码，扇区 - 磁盘（SD）代码和STAIR代码（在Linux内核版本3.10.0-327中，以及Samsung 850 Pro SSD的数组。我们对数据丢失故障的详细分析表明，A）在可靠性方面无法替换RAID6，B）行明智的擦除代码是当代SSD设备的最有效选择，以及C）以前的模型高估SSD阵列可靠性多达六个数量级，因为它们只关注数据条带内的错误页面（比特错误）和坏芯片的巧合，该数据条带有SSD阵列中数据丢失的少数根本原因。我们的实验表明，具有糟糕块的坏芯片的组合被认为是RAID5和新兴代码中数据丢失的主要来源（贡献超过54和90％的RAID5和新兴代码的数据丢失），而RAID6仍然坚固根据这些失败组合。最后，故障注射结果表明，SSD阵列可靠性以及故障细分与SSD型显着相关。

著录项

来源
《IEEE Transactions on Computers》 |2020年第5期|649-665|共17页
作者
Kishani Mostafa; Ahmadian Saba; Asadi Hossein;
展开▼
作者单位

Sharif Univ Technol Dept Comp Engn Data Storage Networks & Proc DSN Lab Tehran 1458889694 Iran;

Sharif Univ Technol Dept Comp Engn Data Storage Networks & Proc DSN Lab Tehran 1458889694 Iran;

Sharif Univ Technol Dept Comp Engn Data Storage Networks & Proc DSN Lab Tehran 1458889694 Iran;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
SSD arrays; reliability; modeling; erasure codes; RAID;

机译：SSD阵列;可靠性;建模;擦除代码;RAID;

相似文献

外文文献
中文文献
专利

1. Can Erasure Codes Damage Reliability in SSD-Based Storage Systems? [J] . Saeideh Alinezhad Chamazcoti, Bardia Safaei, Seyed Ghassem Miremadi Emerging Topics in Computing, IEEE Transactions on . 2019,第3期

机译：擦除代码是否会损害基于SSD的存储系统的可靠性？
2. On endurance and performance of erasure codes in SSD-based storage systems [J] . Chamazcoti Saeideh Alinezhad, Delavari Ziba, Miremadi Seyed Ghassem, Microelectronics & Reliability . 2015,第11期

机译：基于SSD的存储系统中擦除码的耐久性和性能
3. On designing endurance aware erasure code for SSD-based storage systems [J] . Chamazcoti Saeideh Alinezhad, Miremadi Seyed Ghassem Microprocessors and microsystems . 2016,第SEPa期

机译：在为基于SSD的存储系统设计持久性感知擦除代码时
4. Understanding system characteristics of online erasure coding on scalable, distributed and large-scale SSD array systems [C] . Sungjoon Koh, Jie Zhang, Miryeong Kwon, 2017 IEEE International Symposium on Workload Characterization . 2017

机译：了解可扩展，分布式和大规模SSD阵列系统上的在线擦除编码的系统特性
5. Increasing data reliability and recovery I/O performance in erasure coded storage systems. [D] . Khan, Osama S. 2013

机译：在擦除编码存储系统中提高数据可靠性和恢复I / O性能。
6. The ecology of plasmid-coded antibiotic resistance: a basic framework for experimental research and modeling [O] . Martin Zwanzig 2021

机译：质粒编码抗生素抗性的生态学：实验研究和建模的基本框架
7. Understanding System Characteristics of Online Erasure Coding on Scalable, Distributed and Large-Scale SSD Array Systems [O] . Koh, Sungjoon, Zhang, Jie, Kwon, Miryeong, 2017

机译：理解在线擦除编码的系统特性可扩展，分布式和大规模ssD阵列系统
8. An object-oriented framework for magnetic-fusion modeling and analysis codes [R] . 1999

机译：面向对象的磁融合建模和分析代码框架

A Modeling Framework for Reliability of Erasure Codes in SSD Arrays

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅