首页> 外文会议>IEEE International Conference on Computer Design >A fine-grained link-level fault-tolerant mechanism for networks-on-chip
【24h】

A fine-grained link-level fault-tolerant mechanism for networks-on-chip

机译:用于芯片网络的细粒度链路电平容错机制

获取原文

摘要

Silicon technology scaling is continuously enabling denser integration capabilities. However, this comes at the expense of higher variability and susceptibility to wear-out. With an escalating number of on-chip components expected to be defective in near-future chips, modern parallel systems, such as Chip Multi-Processors (CMP), become especially vulnerable to these faults. Just a single link failure in the underlying Network on-Chip (NoC) may cause inter-tile communication to halt and even deadlock, rendering the chip useless. While fault-tolerant routing schemes do exist, they can only handle a finite number of link faults. In this paper, we address permanent wire failures which can occur in on-chip parallel links at manufacture-time or while in operation. Instead of marking the entire link as faulty, we present a methodology where the Partially Faulty Link (PFL) can still be used to transfer data between NoC routers, thus maintaining network connectivity, extending the yield and lifetime of the chip, and allowing for graceful performance degradation. To achieve this, we devise architectural augmentations both to the router and link micro-architectures, along with link fault detection, diagnosis, and re-configuration at the level of wire granularity. Statistical link-level fault models present the usability of PFLs, while relevant load-balancing routing algorithms and low-cost re-transmission mechanisms are also presented and coupled to the proposed architecture. Hardware synthesis demonstrates the feasibility of the proposed extensions to the base NoC architecture. Results obtained from full-system simulations show that high-performance NoCs are realizable in the presence of PFLs.
机译:Silicon Technology Scaling不断实现更密集的集成功能。然而,这是以更高的变异性和易磨损的牺牲品为代价。随着在近期芯片的内芯片组件数量升高,现代平行系统(如芯片多处理器(CMP))的数量变得尤其容易受到这些故障的影响。底层网络(NOC)中的单个链路故障可能导致跨越界面通信来停止甚至死锁,使芯片无用。虽然存在容错路由方案确实存在,但它们只能处理有限数量的链路故障。在本文中,我们解决了永久性的导线故障,其在制造 - 时间或在操作中的片上并联连杆中可能发生。我们介绍了一种方法而不是将整个链接标记为缺陷,而是仍然可以使用部分故障链路(PFL)在NoC路由器之间传输数据,从而维持网络连接,延长芯片的产量和寿命,并允许优雅性能下降。为此,我们设计了路由器和链接微型体系结构的架构增强,以及线粒度水平的链路故障检测,诊断和重新配置。统计链路级故障模型呈现PFL的可用性,而相关的负载平衡路由算法也被呈现并耦合到所提出的架构。硬件综合表明,所提出的扩展到基础NOC架构的可行性。从全系统模拟获得的结果表明,在PFL的存在下,高性能NOC可实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号