...
首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >FAULTSIM: A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems
【24h】

FAULTSIM: A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems

机译:FAULTSIM:用于常规和3D堆叠系统的快速,可配置的存储器可靠性模拟器

获取原文
获取原文并翻译 | 示例
           

摘要

As memory systems scale, maintaining their Reliability Availability and Serviceability (RAS) is becoming more complex. To make matters worse, recent studies of DRAM failures in data centers and supercomputer environments have highlighted that large-granularity failures are common in DRAM chips. Furthermore, the move toward 3D-stacked memories can make the system vulnerable to newer failure modes, such as those occurring from faults in Through-Silicon Vias (TSVs). To architect future systems and to use emerging technology, system designers will need to employ strong error correction and repair techniques. Unfortunately, evaluating the relative effectiveness of these reliability mechanisms is often difficult and is traditionally done with analytical models, which are both error prone and time-consuming to develop. To this end, this article proposes FAULTSIM, a fast configurable memory-reliability simulation tool for 2D and 3D-stacked memory systems. FaultSim employs Monte Carlo simulations, which are driven by real-world failure statistics. We discuss the novel algorithms and data structures used in FaultSim to accelerate the evaluation of different resilience schemes. We implement BCH-1 (SECDED) and ChipKill codes using FaultSim and validate against an analytical model. FaultSim implements BCH-1 and ChipKill codes with a deviation of only 0.032% and 8.41% from the analytical model. FaultSim can simulate 1 million Monte Carlo trials (each for a period of 7 years) of BCH-1 and ChipKill codes in only 34 seconds and 33 seconds, respectively.
机译:随着内存系统的扩展,维护其可靠性,可用性和可维护性(RAS)变得越来越复杂。更糟糕的是,最近对数据中心和超级计算机环境中的DRAM故障的研究表明,大粒度故障在DRAM芯片中很常见。此外,向3D堆栈存储器的转移可能会使系统容易受到新型故障模式的影响,例如由于硅通孔(TSV)的故障而导致的故障模式。为了构建未来的系统并使用新兴技术,系统设计人员将需要采用强大的错误纠正和修复技术。不幸的是,评估这些可靠性机制的相对有效性通常是困难的,并且传统上是使用分析模型来完成的,分析模型容易出错并且开发耗时。为此,本文提出了FAULTSIM,这是一种用于2D和3D堆叠存储系统的快速可配置存储器可靠性仿真工具。 FaultSim采用了由实际故障统计信息驱动的蒙特卡洛模拟。我们讨论了FaultSim中使用的新颖算法和数据结构,以加快对不同弹性方案的评估。我们使用FaultSim实现BCH-1(SECDED)和ChipKill代码,并针对分析模型进行验证。 FaultSim实现的BCH-1和ChipKill代码与分析模型的偏差仅为0.032%和8.41%。 FaultSim可以仅在34秒和33秒内分别模拟100万次BCH-1和ChipKill代码的Monte Carlo试验(每个试验为期7年)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号