首页> 外文OA文献 >Reliability Modelling Of Whole RAID Storage Subsystems
【2h】

Reliability Modelling Of Whole RAID Storage Subsystems

机译:整个RAID存储子系统的可靠性建模

摘要

Reliability modelling of RAID storage systems with its various components such as RAID controllers, enclosures, expanders, interconnects and disks is important from a storage system designer's point of view. A model that can express all the failure characteristics of the whole RAID storage system can be used to evaluate design choices, perform cost reliability trade-offs and conduct sensitivity analyses. We present a reliability model for RAID storage systems where we try to model all the components as accurately as possible. We use several state-space reduction techniques, such as aggregating all in-series components and hierarchical decomposition, to reduce the size of our model. To automate computation of reliability, we use the PRISM model checker as a CTMC solver where appropriate. Initially, we assume a simple 3-state disk reliability model with independent disk failures. Later, we assume a Weibull model for the disks; we also consider a correlated disk failure model to check correspondence with the field data available. For all other components in the system, we assume exponential failure distribution. To use the CTMC solver, we approximate the Weibull distribution for a disk using sum of exponentials and we first confirm that this model gives results that are in reasonably good agreement with those from the sequential Monte Carlo simulation methods for RAID disk subsystems. Next, our model for whole RAID storage systems (that includes, for example, disks, expanders, enclosures) uses Weibull distributions and, where appropriate, correlated failure modes for disks, and exponential distributions with independent failure modes for all other components. Since the CTMC solver cannot handle the size of the resulting models, we solve such models using hierarchical decomposition technique. We are able to model fairly large configurations with upto 600 disks using this model. We can use such reasonably complete models to conduct several "what-if" analyses for many RAID storage systems of interest. Our results show that, depending on the configuration, spanning a RAID group across enclosures may increase or decrease reliability. Another key finding from our model results is that redundancy mechanisms such as multipathing is beneficial only if a single failure of some other component does not cause data inaccessibility of a whole RAID group.
机译:从存储系统设计者的角度来看,具有各种组件(例如RAID控制器,机柜,扩展器,互连模块和磁盘)的RAID存储系统的可靠性建模非常重要。可以表达整个RAID存储系统所有故障特征的模型可用于评估设计选择,进行成本可靠性权衡并进行敏感性分析。我们提出了RAID存储系统的可靠性模型,在该模型中,我们尝试对所有组件进行尽可能准确的建模。我们使用几种状态空间缩减技术(例如,汇总所有串联组件和分层分解)来减小模型的大小。为了自动计算可靠性,我们在适当的地方使用PRISM模型检查器作为CTMC求解器。最初,我们假设一个具有独立磁盘故障的简单三态磁盘可靠性模型。后来,我们假设这些磁盘为Weibull模型。我们还考虑了相关的磁盘故障模型来检查与可用字段数据的对应性。对于系统中的所有其他组件,我们假定指数故障分布。要使用CTMC求解器,我们使用指数和估算磁盘的Weibull分布,并且首先确认该模型所提供的结果与RAID磁盘子系统的顺序蒙特卡洛模拟方法的结果具有相当好的一致性。接下来,我们用于整个RAID存储系统(包括例如磁盘,扩展器,存储模块)的模型将使用Weibull分布,并在适当的情况下使用磁盘的相关故障模式,以及对所有其他组件使用独立故障模式的指数分布。由于CTMC求解器无法处理所得模型的大小,因此我们使用层次分解技术来求解此类模型。使用此模型,我们能够使用多达600个磁盘对相当大的配置进行建模。我们可以使用这种合理完整的模型对许多感兴趣的RAID存储系统进行几种“假设分析”。我们的结果表明,根据配置,跨机箱跨越RAID组可能会提高或降低可靠性。我们的模型结果的另一个关键发现是,仅当某些其他组件的单个故障不会导致整个RAID组的数据不可访问时,诸如多路径之类的冗余机制才有用。

著录项

  • 作者

    Karmakar Prasenjit;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 en_US
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号