首页> 外文会议>IEEE International Conference on 3D System Integration >3D memory stacking for fast checkpointing/restore applications
【24h】

3D memory stacking for fast checkpointing/restore applications

机译:3D内存堆叠快速检查点/恢复应用程序

获取原文

摘要

As technology scales, modern massive parallel processing (MPP) systems are facing large system overhead caused by high failure rates. To provide the system-level fault tolerance, the traditional in-disk checkpointing/restart schemes are usually adopted by periodically dumping system states and memory contents to hard disk drives (HDDs). When errors occur, the system can be restored by reading checkpoints from HDDs. The low bandwidth and slow speed of HDDs are now becoming the major bottleneck for the MPP system performance. Consequently, novel checkpointing schemes are need to facilitate the move from Petascale computing to Exascale computing. We have proposed a 3D memory stacking method [1] that leverage the massive number of TSVs between memory layers to help high-bandwidth checkpointing/restore. To validate the proposed scheme, we design a 2-layer TSV-based SRAM/SRAM 3D-stacked chip to mimic the high-bandwidth and fast data transfer from one memory layer to another memory layer, so that the inmemory checkpointing/restartrestore scheme can be enabled for the future exascale computing. The capacity of each SRAM layer is 1Mbit. Each layer contains 64 banks, with each bank contains 256 words and the word length is 64-bit. The final footprint including I/O pad is 2.9mm×2mm. The SRAM dies were taped out in GlobalFoundries using its 130nm low power process, and the 3D stacking was done by using Tezzaron's TSV technology. The prototyping chip can perform checkpointing/restart at the speed of 4K/cycle with 1Ghz clock.
机译:随着技术尺度,现代大规模并行处理(MPP)系统面临着高故障率引起的大型系统开销。为了提供系统级容错,通常通过定期将系统状态和内存内容转储到硬盘驱动器(HDD)来采用传统的磁盘检查点/重启方案。发生错误时,可以通过从HDD读取检查点来恢复系统。 HDD的低带宽和慢速速度现在正在成为MPP系统性能的主要瓶颈。因此,新颖的检查点方案需要促进从PetaScale计算到Exascale计算的移动。我们提出了一种3D内存堆叠方法[1],它在存储器层之间利用大量的TSV来帮助高带宽检查点/恢复。要验证所提出的方案,我们设计了一个基于2层的TSV的SRAM / SRAM 3D堆叠芯片,以模拟从一个存储器层到另一个存储器层的高带宽和快速数据传输,以便Inmemory CheckPointing / Restartrestore方案可以为未来的Exasgale计算启用。每个SRAM层的容量为1Mbit。每层包含64个银行,每个银行包含256个单词,单词长度为64位。包括I / O焊盘的最终足迹为2.9mm×2mm。使用其130nm的低功率工艺在GlobalFoundries中占用SRAM模块,使用Tezzaron的TSV技术完成3D堆叠。原型芯片可以以1GHz时钟以4K /循环的速度执行检查点/重启。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号