【24h】

Online Fault Tolerance for FPGA Logic Blocks

机译:FPGA逻辑模块的在线容错

获取原文
获取原文并翻译 | 示例

摘要

Most adaptive computing systems use reconfigurable hardware in the form of field programmable gate arrays (FPGAs). For these systems to be fielded in harsh environments where high reliability and availability are a must, the applications running on the FPGAs must tolerate hardware faults that may occur during the lifetime of the system. In this paper, we present new fault-tolerant techniques for FPGA logic blocks, developed as part of the roving self-test areas (STARs) approach to online testing, diagnosis, and reconfiguration . Our techniques can handle large numbers of faults (we show tolerance of over 100 logic faults via actual implementation on an FPGA consisting of a 20 $times$ 20 array of logic blocks). A key novel feature is the reuse of defective logic blocks to increase the number of effective spares and extend the mission life. To increase fault tolerance, we not only use nonfaulty parts of defective or partially faulty logic blocks, but we also use faulty parts of defective logic blocks in nonfaulty modes. By using and reusing faulty resources, our multilevel approach extends the number of tolerable faults beyond the number of currently available spare logic resources. Unlike many column, row, or tile-based methods, our multilevel approach can tolerate not only faults that are evenly distributed over the logic area, but also clusters of faults in the same local area. Furthermore, system operation is not interrupted for fault diagnosis or for computing fault-bypassing configurations. Our fault tolerance techniques have been implemented using ORCA 2C series FPGAs which feature incremental dynamic runtime reconfiguration.
机译:大多数自适应计算系统使用现场可编程门阵列(FPGA)形式的可重新配置硬件。为了将这些系统部署在必须具有高可靠性和可用性的恶劣环境中,运行在FPGA上的应用程序必须能够承受在系统生命周期内可能发生的硬件故障。在本文中,我们提出了针对FPGA逻辑模块的新容错技术,这些技术是在线测试,诊断和重新配置的巡回自检区域(STAR)方法的一部分。我们的技术可以处理大量的故障(通过在FPGA上的实际实现显示出100多个逻辑故障的容错性,该FPGA由20 x 20的逻辑块阵列组成)。关键的新颖功能是重复使用有缺陷的逻辑块,以增加有效备件的数量并延长任务寿命。为了提高容错能力,我们不仅使用有缺陷或部分有故障的逻辑块的非故障部分,而且还在无故障模式下使用有缺陷的逻辑块的有故障部分。通过使用和重用故障资源,我们的多级方法将可容忍的故障数量扩展到了当前可用的备用逻辑资源数量之外。与许多基于列,行或图块的方法不同,我们的多级方法不仅可以容忍在逻辑区域上平均分布的故障,还可以容忍同一局部区域中的故障簇。此外,系统操作不会中断以进行故障诊断或计算故障旁路配置。我们的容错技术已使用ORCA 2C系列FPGA实现,这些FPGA具有增量动态运行时重新配置功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号