...
首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Two-State Checkpointing for Energy-Efficient Fault Tolerance in Hard Real-Time Systems
【24h】

Two-State Checkpointing for Energy-Efficient Fault Tolerance in Hard Real-Time Systems

机译:硬实时系统中的高效节能容错两态检查点

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Checkpointing with rollback recovery is a well-established technique to tolerate transient faults. However, it incurs significant time and energy overheads, which go wasted in fault-free execution states and may not even be feasible in hard real-time systems. This paper presents a low-overhead two-state checkpointing (TsCp) scheme for fault-tolerant hard real-time systems. It differentiates between the fault-free and faulty execution states and leverages two types of checkpoint intervals for these two different states. The first type is nonuniform intervals that are used while no fault has occurred. These intervals are determined based on postponing checkpoint insertions in fault-free states, with the aim of decreasing the number of checkpoint insertions. The second type is uniform intervals that are used from the time when the first fault occurs. They are determined so as to minimize execution time for faulty states, leaving more time available for energy management in fault-free states. Experimental evaluation on an embedded processor (LEON3) and an emerging nonvolatile memory technology (ReRAM) illustrates that TsCp significantly reduces the number of checkpoints (62% on average) compared with previous works, while preserving fault tolerance. This results in 14% and 13% reduced execution time and energy consumption, respectively. Furthermore, we combine TsCp with dynamic voltage scaling (DVS) and achieve up to 26% (21% on average) energy saving compared with the state-of-the-art techniques.
机译:具有回滚恢复功能的检查点是一种耐受瞬态故障的成熟技术。但是,这会导致大量的时间和精力开销,在无故障的执行状态下会浪费掉,甚至在硬实时系统中也不可行。本文提出了一种用于容错硬实时系统的低开销的两态检查点(TsCp)方案。它区分无故障和有故障的执行状态,并对这两种不同的状态利用两种类型的检查点间隔。第一种是不发生故障时使用的非均匀间隔。这些间隔是基于无故障状态下推迟检查点插入来确定的,目的是减少检查点插入的次数。第二种是统一间隔,从第一次故障发生时开始使用。确定它们是为了最大程度地减少故障状态的执行时间,从而在无故障状态下为能源管理留出更多时间。对嵌入式处理器(LEON3)和新兴的非易失性存储技术(ReRAM)的实验评估表明,与以前的工作相比,TsCp大大减少了检查点的数量(平均62%),同时保留了容错能力。这样可分别减少14%和13%的执行时间和能耗。此外,我们将TsCp与动态电压缩放(DVS)结合起来,与最新技术相比,可节省多达26%(平均21%)的能源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号