首页> 外文会议>IEEE High Performance Extreme Computing Conference >Havens: Explicit reliable memory regions for HPC applications
【24h】

Havens: Explicit reliable memory regions for HPC applications

机译:避风港:适用于HPC应用程序的显式可靠内存区域

获取原文

摘要

Supporting error resilience in future exascale-class supercomputing systems is a critical challenge. Due to transistor scaling trends and increasing memory density, scientific simulations are expected to experience more interruptions caused by transient errors in the system memory. Existing hardware-based detection and recovery techniques will be inadequate to manage the presence of high memory fault rates. In this paper we propose a partial memory protection scheme based on region-based memory management. We define the concept of regions called havens that provide fault protection for program objects. We provide reliability for the regions through a software-based parity protection mechanism. Our approach enables critical program objects to be placed in these havens. The fault coverage provided by our approach is application agnostic, unlike algorithm-based fault tolerance techniques.
机译:在未来Exascale类超级计算系统中支持错误弹性是一个关键挑战。由于晶体管缩放趋势和增加的内存密度,预计科学仿真预计会遇到系统内存中的瞬态误差引起的更多中断。基于现有的基于硬件的检测和恢复技术是不充分的,以管理高存储器故障率的存在。在本文中,我们提出了一种基于区域的内存管理的部分记忆保护方案。我们定义了名为Havens的地区的概念,为程序对象提供故障保护。我们通过基于软件的奇偶校验保护机制为区域提供可靠性。我们的方法使得将关键的程序对象能够放在这些避风港。我们的方法提供的故障覆盖是应用不可知的,与基于算法的容错技术不同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号