首页> 外文会议>International Conference for High Performance Computing, Networking, Storage and Analysis >Understanding Soft Error Resiliency of Blue Gene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection
【24h】

Understanding Soft Error Resiliency of Blue Gene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection

机译:通过硬件质子辐照和软件故障注入了解Blue Gene / Q计算芯片的软错误恢复能力

获取原文

摘要

Soft Error Resiliency is a major concern for Petascale high performance computing (HPC) systems. Blue Gene/Q (BG/Q) is the third generation of IBM's massively parallel, energy efficient Blue Gene series of supercomputers. The principal goal of this work is to understand the interaction between Blue-Gene/Q's hardware resiliency features and high-performance applications through proton irradiation of a real chip, and software resiliency inherent in these applications through application-level fault injection (AFI) experiments. From the proton irradiation experiments we derived that the mean time between correctable errors at sea level of the SRAM-based register files and Level-1 caches for a system similar to the scale of Sequoia system. From the AFI experiments, we characterized relative vulnerability among the applications in both general purpose and floating point register files. We categorized and quantified the failure outcomes, and discovered characteristics in the applications that lead to many masking improvement opportunities.
机译:软错误恢复能力是Petascale高性能计算(HPC)系统的主要关注点。 Blue Gene / Q(BG / Q)是IBM大规模并行,节能的Blue Gene系列超级计算机的第三代。这项工作的主要目的是通过对真实芯片的质子辐照来了解Blue-Gene / Q的硬件弹性功能与高性能应用程序之间的相互作用,以及通过应用程序级故障注入(AFI)实验来了解这些应用程序中固有的软件弹性。 。从质子辐照实验中,我们得出了类似于红杉系统规模的系统,基于SRAM的寄存器文件和Level-1缓存的海平面可纠正错误之间的平均时间。通过AFI实验,我们在通用和浮点寄存器文件中描述了应用程序之间的相对漏洞。我们对失败的结果进行了分类和量化,并发现了导致许多掩盖改进机会的应用程序中的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号