...
首页> 外文期刊>IEEE Transactions on Computers >Fine-Grained Checkpoint Recovery for Application-Specific Instruction-Set Processors
【24h】

Fine-Grained Checkpoint Recovery for Application-Specific Instruction-Set Processors

机译:专用指令集处理器的细粒度检查点恢复

获取原文
获取原文并翻译 | 示例
           

摘要

Checkpoint recovery (CR) is a classic fault-tolerance technique, which enables computing systems to execute correctly even when affected by transient faults. Although a number of software and hardware based approaches for CR does exist, these approaches usually are either too large, too slow, or require extensive modifications to the software and the caching/memory schemes. In this paper, we propose a novel CR approach, which is based on re-engineering the instruction set of a target processor. We take the base instruction set and augment the native micro-operations, i.e., an architectural description language (ADL), with additional micro-operations to perform checkpointing at the granularity of basic blocks. The recovery mechanism is realized by three custom instructions, which can undo the corruptions caused by transient faults during instruction execution, including the values of general-purpose registers, data memory, and special-purpose registers (PC, status registers, etc.), which were incorrectly modified. Our checkpoint storage is sized according to the application program executed. The experimental results show that our approach degrades the system performance by just 0.76 percent when there is no fault, and introduces an area overhead of 44 percent on average and 79 percent in the worst case. During the fault injection test with the benchmark applications, the recovery took just 62 clock cycles (worst case).
机译:检查点恢复(CR)是一种经典的容错技术,即使在受到瞬时故障影响时,它也可以使计算系统正确执行。尽管确实存在许多用于CR的基于软件和硬件的方法,但是这些方法通常太大,太慢,或者需要对软件和缓存/内存方案进行大量修改。在本文中,我们提出了一种新颖的CR方法,该方法基于重新设计目标处理器的指令集。我们采用基本指令集并扩展本机微操作,即架构描述语言(ADL),以及其他微操作以基本块的粒度执行检查点。恢复机制由三个自定义指令实现,可以消除指令执行过程中由瞬时故障引起的损坏,包括通用寄存器,数据存储器和专用寄存器(PC,状态寄存器等)的值,被错误地修改了。我们的检查点存储根据执行的应用程序来确定大小。实验结果表明,在没有故障的情况下,我们的方法只会使系统性能降低0.76%,并且平均会带来44%的区域开销,最坏的情况下会带来79%的开销。在使用基准测试应用程序进行故障注入测试期间,恢复仅花费了62个时钟周期(最坏的情况)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号