首页> 外文期刊>IEEE Transactions on Computers >Resource Conscious Diagnosis and Reconfiguration for NoC Permanent Faults
【24h】

Resource Conscious Diagnosis and Reconfiguration for NoC Permanent Faults

机译:NoC永久故障的资源意识诊断和重新配置

获取原文
获取原文并翻译 | 示例

摘要

Networks-on-chip (NoCs) have been increasingly adopted in recent years due to the extensive integration of many components in modern multicore processors and system-on-chip designs. At the same time, transistor reliability is becoming a major concern due to the continuous scaling of silicon. As the sole medium of on-chip communication, it is critical for a NoC to be able to tolerate many permanent transistor failures. In this paper, we propose uDIREC, a unified framework for permanent fault diagnosis and subsequent reconfiguration in NoCs, which provides graceful performance degradation with an increasing number of faults. Upon in-field transistor failures, uDIREC leverages a fine-resolution diagnosis mechanism to disable faulty components very sparingly. At its core, uDIREC employs MOUNT, a novel routing algorithm to find reliable and deadlock-free routes that utilize all the still-functional links in the NoC. We implement uDIREC's reconfiguration as a truly-distributed hardware solution, still keeping the area overhead at a minimum. We also propose a software-implemented reconfiguration that provides greater integration with our software-based diagnosis scheme, at the cost of distributed nature of implementation. Regardless of the adopted implementation scheme, uDIREC places no restriction on topology, router architecture and number and location of faults. Experimental results show that uDIREC, implemented in a 64-node NoC, drops 3 fewer nodes and provides greater than 25 percent throughput improvement (beyond 15 faults) when compared to other state-of-the-art fault-tolerance solutions. uDIREC's improvement over prior-art grows further with more faults, making it a effective NoC reliability solution for a wide range of fault rates.
机译:近年来,由于现代多核处理器和片上系统设计中许多组件的广泛集成,片上网络(NoC)越来越多地被采用。同时,由于硅的连续缩放,晶体管的可靠性正成为主要关注的问题。作为片上通信的唯一媒介,NoC能够承受许多永久性晶体管故障至关重要。在本文中,我们提出了uDIREC,这是用于NoC中永久性故障诊断和后续重新配置的统一框架,该框架可随着故障数量的增加而适度地降低性能。当发生现场晶体管故障时,uDIREC利用精细的诊断机制非常谨慎地禁用故障组件。 uDIREC的核心是采用一种新颖的路由算法MOUNT,以找到可利用NoC中所有仍起作用的链路的可靠且无死锁的路由。我们将uDIREC的重新配置实现为真正的分布式硬件解决方案,同时仍将面积开销保持在最低水平。我们还提出了一种软件实现的重新配置,该配置可与我们基于软件的诊断方案更好地集成,但以实现的分布式性质为代价。不管采用哪种实施方案,uDIREC都对拓扑,路由器体系结构以及故障数量和位置没有任何限制。实验结果表明,与其他最新的容错解决方案相比,在64个节点的NoC中实施的uDIREC减少了3个节点,并提高了25%以上的吞吐量(超过15个故障)。 uDIREC对现有技术的改进随着更多的故障而进一步发展,使其成为适用于各种故障率的有效NoC可靠性解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号