首页> 外文会议>IEEE/ACM International Symposium on Microarchitecture >Encore: Low-cost, fine-grained transient fault recovery
【24h】

Encore: Low-cost, fine-grained transient fault recovery

机译:Encore:低成本,细粒度的瞬态故障恢复

获取原文

摘要

To meet an insatiable consumer demand for greater performance at less power, silicon technology has scaled to unprecedented dimensions. However, the pursuit of faster processors and longer battery life has come at the cost of reliability. Given the rise of processor reliability as a first-order design constraint, there has been a growing interest in low-cost, non-intrusive techniques for transient fault detection. Many of these recent proposals have counted on the availability of hardware recovery mechanisms. Although common in aggressive out-of-order cores, hardware support for speculative rollback and recovery is less common in lower-end commodity processors. This paper presents Encore, a software-based fault recovery mechanism tailored for these lower-cost systems that lack native hardware support for speculative rollback recovery. Encore combines program analysis, profile data, and simple code transformations to create statistically idempotent code regions that can recover from faults at very little cost. Using this software-only, compiler-based approach, Encore provides the ability to recover from transient faults without specialized hardware or the costs of traditional, full-system checkpointing solutions. Experimental results show that Encore, with just 14% of runtime overhead, can safely recover, on average from 97% of transient faults when coupled with existing detection schemes.
机译:为了满足消费者对以更低的功耗获得更高性能的不满足需求,硅技术已扩展到前所未有的规模。但是,追求更快的处理器和更长的电池寿命是以牺牲可靠性为代价的。考虑到作为一阶设计约束的处理器可靠性的提高,对用于瞬态故障检测的低成本,非侵入式技术的兴趣日益浓厚。这些最近的提议中有许多都依靠硬件恢复机制的可用性。尽管在积极的无序内核中很常见,但对推测性回滚和恢复的硬件支持在较低端的商品处理器中并不常见。本文介绍了Encore,这是一种基于软件的故障恢复机制,专门针对这些成本较低的系统而设计,这些系统缺乏对推测性回滚恢复的本机硬件支持。 Encore结合了程序分析,配置文件数据和简单的代码转换,以创建统计上幂等的代码区域,从而可以以很少的成本从故障中恢复。通过使用这种仅基于软件的,基于编译器的方法,Encore能够从瞬态故障中恢复,而无需专用硬件或传统的全系统检查点解决方案。实验结果表明,与现有的检测方案结合使用时,Encore的运行时间开销仅为14%,平均而言可以安全地从97%的瞬时故障中恢复。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号