首页> 外文期刊>IEEE Transactions on Reliability >Dependability Analysis of Data Storage Systems in Presence of Soft Errors
【24h】

Dependability Analysis of Data Storage Systems in Presence of Soft Errors

机译:存在软错误时数据存储系统的可靠性分析

获取原文
获取原文并翻译 | 示例

摘要

In recent years, high availability and reliability of data storage systems (DSS) have been significantly threatened by soft errors occurring in storage controllers. Due to their specific functionality and hardware-software stack, error propagation and manifestation in DSS is quite different from general-purpose computing architectures. To the best of our knowledge, no previous study has examined the system-level effects of soft errors on the availability and reliability of DSS. In this paper, we first analyze the effects of soft errors occurring in the server processors of storage controllers on the entire storage system dependability. To this end, we implement the major functions of a typical data storage system controller, running on a full stack of storage system operating system, and develop a framework to perform fault injection experiments using a full system simulator. We then propose a new metric, storage system vulnerability factor (SSVF), to accurately capture the impact of soft errors in storage systems. By conducting extensive experiment, it is revealed that depending on the controller configuration, up to 40% of cachememory contains end-user data in which any unrecoverable soft errors will result in data loss (DL) in an irreversible manner. However, soft errors in the rest of cache memory filled by operating system and storage applications will result in data unavailability (DU) at the storage system level. Our analysis also shows that detectable unrecoverable errors on the cache data field are the major cause of DU in storage systems, while silent data corruptions in the cache tag and data fields are mainly the cause of DL in storage systems.
机译:近年来,数据存储系统(DSS)的高可用性和可靠性受到存储控制器中发生的软错误的严重威胁。由于其特定的功能和硬件软件堆栈,DSS中的错误传播和表现形式与通用计算体系结构有很大不同。就我们所知,以前的研究都没有研究软错误对DSS可用性和可靠性的系统级影响。在本文中,我们首先分析存储控制器的服务器处理器中发生的软错误对整个存储系统可靠性的影响。为此,我们实现了在完整的存储系统操作系统堆栈上运行的典型数据存储系统控制器的主要功能,并开发了使用完整的系统模拟器执行故障注入实验的框架。然后,我们提出了一个新的指标,即存储系统易损性因子(SSVF),以准确地捕获存储系统中软错误的影响。通过进行广泛的实验,我们发现,取决于控制器的配置,多达40%的缓存内存包含最终用户数据,其中任何不可恢复的软错误都会以不可逆的方式导致数据丢失(DL)。但是,由操作系统和存储应用程序填充的其余缓存内存中的软错误将导致在存储系统级别出现数据不可用(DU)。我们的分析还表明,缓存数据字段中可检测到的不可恢复错误是存储系统中DU的主要原因,而缓存标签和数据字段中的无提示数据损坏则主要是存储系统中DL的原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号