首页> 外文期刊>IEEE Transactions on Computers >Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory Systems
【24h】

Configurable-ECC: Architecting a Flexible ECC Scheme to Support Different Sized Accesses in High Bandwidth Memory Systems

机译:配置-ECC:架构灵活的ECC方案,以支持高带宽内存系统中的不同尺寸访问

获取原文
获取原文并翻译 | 示例

摘要

Designing error correction code (ECC) to guarantee strong reliability for high bandwidth memory (HBM) is imperative in high performance computers, especially for systems equipped with graphics processing units (GPUs). The design of ECC is challenging because future GPUs are expected to implement a memory subsystem supporting fine and coarse-grained data accesses to match the difference in the spatial locality of GPGPU applications. Current ECC designs, however, are developed for a fixed data fetch granularity. To have a more flexible design, we propose a novel memory protection scheme, called Config(urable)-ECC, which provides strong reliability for both fine and coarse-grained data accesses. Config-ECC consists of two tiers of ECC protection. The tier-1 code is a strong product code that can correct errors due to small granularity faults and detect errors caused by large granularity faults. The tier-2 code is an XOR-based code that is employed to correct errors incurred by large granularity faults. Config-ECC provides stronger reliability and/or lower energy consumption compared to state-of-the-art fixed 32B and 64B ECC schemes. It reduces the HBM energy by 17-21 percent while reducing the failure in time (FIT) rate by 20 times compared to a state-of-the-art fixed 64B ECC scheme with an insignificant 1.2 percent performance overhead.
机译:设计纠错码(ECC)为了保证高带宽存储器(HBM)的强度可靠性,在高性能计算机中是必不可少的,特别是对于配备图形处理单元(GPU)的系统。 ECC的设计是具有挑战性的,因为未来的GPU有望实现支持精细和粗粒度的数据访问的存储器子系统,以匹配GPGPU应用程序的空间局部的差异。然而,目前的ECC设计是为固定数据提取粒度开发的。为了具有更灵活的设计,我们提出了一种新颖的记忆保护方案,称为CON​​FIC(UTABLE)-ECC,这为精密和粗粒度的数据访问提供了强大的可靠性。 Config-ECC由两层ECC保护组成。 Tier-1代码是一个强产品代码,可以纠正由于小粒度故障而纠正错误,并检测由大粒度故障引起的错误。 Tier-2代码是基于XOR的代码,用于纠正大量粒度故障所产生的错误。与最先进的固定32B和64B ECC方案相比,CONFIG-ECC提供更强的可靠性和/或较低的能量消耗。与最先进的固定的64B ECC方案相比,它将HBM能量降低了17-21%,同时将故障(适合)率(适合)率与具有微不足道的1.2%的性能开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号