首页> 外文期刊>IEEE Transactions on Computers >Run-Time Recovery Mechanism for Transient and Permanent Hardware Faults Based on Distributed, Self-Organized Dynamic Partially Reconfigurable Systems
【24h】

Run-Time Recovery Mechanism for Transient and Permanent Hardware Faults Based on Distributed, Self-Organized Dynamic Partially Reconfigurable Systems

机译:基于分布式自组织动态部分可重配置系统的暂时性和永久性硬件故障的运行时恢复机制

获取原文
获取原文并翻译 | 示例

摘要

Field-Programmable Gate Arrays (FPGAs) are rapidly gaining popularity as implementation platforms for complex space-borne computing systems. However, such systems are exposed to cosmic radiation with levels orders of magnitude higher than terrestrial levels which can cause transient and even permanent hardware faults in on-board computing platforms. Because of this, development of effective fault mitigation methods and self-repair mechanisms has become a vital aspect for FPGA-based space-borne computing platforms. This work presents a novel method for transient and permanent fault mitigation and run-time fault recovery for commercial-grade FPGA devices with partially reconfigurable tile-based architectures. The proposed method ensures the same pre-determined recovery time for transient and permanent hardware faults through dynamic on-chip component relocation regardless of the fault type. The method makes use of fully distributed control, communication, self-synchronization and self-integration mechanisms embedded in each on-chip hardware component. Run-time collaboration between components provides relocation & fault mitigation procedures. The distributed nature of the above mechanisms excludes most central failure points which could cause non-restorable system faults. This method has been implemented, tested and verified on a Xilinx Kintex-7 FPGA platform. Results show that the proposed method is significantly more resource efficient when compared with Triple-Module Redundancy or central, software-based control mechanisms.
机译:现场可编程门阵列(FPGA)作为复杂的航天计算系统的实现平台正在迅速普及。但是,此类系统暴露于宇宙辐射中,其辐射水平要比地面辐射水平高几个数量级,这可能会导致车载计算平台中出现瞬时甚至永久性的硬件故障。因此,开发有效的故障缓解方法和自我修复机制已成为基于FPGA的星载计算平台的重要方面。这项工作提出了一种新颖的方法,用于具有部分可重新配置的基于图块的架构的商业级FPGA器件的瞬态和永久性故障缓解以及运行时故障恢复。所提出的方法可通过动态片上组件重定位来确保瞬态和永久性硬件故障具有相同的预定恢复时间,而与故障类型无关。该方法利用嵌入在每个片上硬件组件中的完全分布式控制,通信,自同步和自集成机制。组件之间的运行时协作提供了重新定位和故障缓解程序。上述机制的分布式性质不包括大多数中央故障点,这些故障点可能会导致不可恢复的系统故障。该方法已在Xilinx Kintex-7 FPGA平台上实现,测试和验证。结果表明,与三重模块冗余或基于软件的中央控制机制相比,该方法的资源效率更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号