首页> 外文OA文献 >Fault Tolerant Network-on-Chip Router Architectures for Multi-Core Architectures
【2h】

Fault Tolerant Network-on-Chip Router Architectures for Multi-Core Architectures

机译:用于多核体系结构的容错片上网络路由器体系结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

As the feature size scales down to deep nanometer regimes, it has enabled the designers to fabricate chips with billions of transistors. The availability of such abundant computational resources on a single chip has made it possible to design chips with multiple computational cores, resulting in the inception of Chip Multiprocessors (CMPs). The widespread use of CMPs has resulted in a paradigm shift from computation-centric architectures to communication-centric architectures. With the continuous increase in the number of cores that can be fabricated on a single chip, communication between the cores has become a crucial factor in its overall performance. Network-on-Chip (NoC) paradigm has evolved into a standard on-chip interconnection network that can efficiently handle the strict communication requirements between the cores on a chip. The components of an NoC include routers, that facilitate routing of data between multiple cores and links that provide raw bandwidth for data traversal. While diminishing feature size has made it possible to integrate billions of transistors on a chip, the advantage of multiple cores has been marred with the waning reliability of transistors. Components of an NoC are not immune to the increasing number of hard faults and soft errors emanating due to extreme miniaturization of transistor sizes. Faults in an NoC result in significant ramifications such as isolation of healthy cores, deadlock, data corruption, packet loss and increased packet latency, all of which have a severe impact on the performance of a chip. This has stimulated the need to design resilient and fault tolerant NoCs. This thesis handles the issue of fault tolerance in NoC routers. Within the NoC router, the focus is specifically on the router pipeline that is responsible for the smooth flow of packets. In this thesis we propose two different fault tolerant architectures that can continue to operate in the presence of faults. In addition to these two architectures, we also propose a new reliability metric for evaluating soft error tolerant techniques targeted towards the control logic of the NoC router pipeline. First, we present Shield, a fault tolerant NoC router architecture that is capable of handling both hard faults and soft errors in its pipeline. Shield uses techniques such as spatial redundancy, exploitation of idle resources and bypassing a faulty resource to achieve hard fault tolerance. The use of these techniques reveals that Shield is six times more reliable than baseline-unprotected router. To handle soft errors, Shield uses selective hardening technique that includes hardening specific gates of the router pipeline to increase its soft error tolerance. To quantify soft error tolerance improvement, we propose a new metric called Soft Error Improvement Factor (SEIF) and use it to show that Shield’s soft error tolerance is three times better than that of the baseline-unprotected router. Then, we present Soft Error Tolerant NoC Router (STNR), a low overhead fault tolerating NoC router architecture that can tolerate soft errors in the control logic of its pipeline. STNR achieves soft error tolerance based on the idea of dual execution, comparison and rollback. It exploits idle cycles in the router pipeline to perform redundant computation and comparison necessary for soft error detection. Upon the detection of a soft error, the pipeline is rolled back to the stage that got affected by the soft error. Salient features of STNR include high level of soft error detection, fault containment and minimum impact on latency. Simulations show that STNR has been able to detect all injected single soft errors in the router pipeline. To perform a quantitative comparison between STNR and other existing similar architectures, we propose a new reliability metric called Metric for Soft error Tolerance (MST) in this thesis. MST is unique in the aspect that it encompasses four crucial factors namely, soft error tolerance, area overhead, power overhead and pipeline latency overhead into a single metric. Analysis using MST shows that STNR provides better reliability while incurring low overhead compared to existing architectures.
机译:随着特征尺寸缩小到深纳米范围,它使设计人员能够制造具有数十亿个晶体管的芯片。单个芯片上如此丰富的计算资源的可用性使设计具有多个计算核心的芯片成为可能,从而导致了芯片多处理器(CMP)的诞生。 CMP的广泛使用已导致范式从以计算为中心的体系结构转换为以通信为中心的体系结构。随着可以在单个芯片上制造的内核数量的不断增加,内核之间的通信已成为其整体性能的关键因素。片上网络(NoC)范例已演变为标准的片上互连网络,可以有效处理芯片上内核之间的严格通信要求。 NoC的组件包括路由器,这些路由器有助于在多个内核和链接之间路由数据,这些内核和链接为数据遍历提供了原始带宽。尽管缩小的功能尺寸已使在芯片上集成数十亿个晶体管成为可能,但多核的优势已因晶体管的可靠性下降而受到损害。由于晶体管尺寸的极度微型化,NoC的组件无法抵抗越来越多的硬故障和软错误。 NoC中的故障会导致严重后果,例如隔离正常的内核,死锁,数据损坏,数据包丢失和增加的数据包延迟,所有这些都会严重影响芯片的性能。这激发了设计弹性和容错NoC的需求。本文解决了NoC路由器中的容错问题。在NoC路由器内,重点特别放在负责数据包平稳流动的路由器管道上。在本文中,我们提出了两种不同的容错体系结构,它们可以在存在故障的情况下继续运行。除了这两种架构,我们还提出了一种新的可靠性指标,用于评估针对NoC路由器流水线的控制逻辑的软容错技术。首先,我们介绍Shield,这是一种容错的NoC路由器体系结构,能够处理其管道中的硬故障和软错误。 Shield使用诸如空间冗余,利用空闲资源和绕过故障资源之类的技术来实现硬容错。这些技术的使用表明,Shield的安全性是不受基线保护的路由器的六倍。为了处理软错误,Shield使用选择性强化技术,该技术包括强化路由器管道的特定门以增加其软错误容忍度。为了量化软错误容忍度的提高,我们提出了一种称为软错误改善因数(SEIF)的新指标,并使用它来表明Shield的软错误容忍度是不受基线保护的路由器的三倍。然后,我们介绍软容错NoC路由器(STNR),一种低开销的容错NoC路由器体系结构,可以容忍其流水线的控制逻辑中的软错误。 STNR基于双重执行,比较和回滚的思想来实现软容错能力。它利用路由器管道中的空闲周期来执行冗余计算和比较,以进行软错误检测。检测到软错误后,管道将回滚到受软错误影响的阶段。 STNR的显着特征包括高水平的软错误检测,故障抑制以及对延迟的最小影响。仿真表明,STNR已能够检测到路由器管道中所有注入的单个软错误。为了在STNR和其他现有类似体系结构之间进行定量比较,本文提出了一种新的可靠性指标,称为软错误容忍度量(MST)。 MST在以下方面具有独特性:它包含四个关键因素,即软错误容忍度,区域开销,功率开销和流水线延迟开销成一个度量。使用MST进行的分析表明,与现有体系结构相比,STNR具有更好的可靠性,同时开销也较低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号