Faults in on-chip level networks are very common. There have been multiple approaches to detect and correct faults in different levels, such as the error-detecting code (EDC) and error-correcting code (ECC) used in the data packet, cyclic redundancy check (CRC) and parity field used in the 802.11 a/b/g frame, the retry/discard mechanism defined in several bus protocols, and the very common solutions in the design flow like built-in self-test (BIST) and scan chain. In this paper, the authors present a scheme, online fault detection and tolerance (ODT), which focuses on the transient faults due to illegal turns of packets. Using ODT, the reliability of the on-chip network is improved and the faulty router is not necessarily turned off. Compared with the DyXY routing algorithm, the latency of transmission is almost the same when no faults occur. The deadlock can be reduced, and around 20 percent more packets can survive, on average.
展开▼
机译:片上级网络中的故障非常普遍。有多种方法可以检测和纠正不同级别的故障,例如数据包中使用的错误检测码(EDC)和纠错码(ECC),循环冗余校验(CRC)和奇偶校验字段。 802.11 a / b / g / n帧,几种总线协议中定义的重试/丢弃机制,以及设计流程中非常常见的解决方案,例如内置自测(BIST)和扫描链。在本文中,作者提出了一种在线故障检测和容错(ODT)方案,该方案重点研究了由于非法转包导致的瞬时故障。使用ODT,可以改善片上网络的可靠性,并且不必关闭发生故障的路由器。与DyXY路由算法相比,没有故障发生时的传输延迟几乎相同。可以减少死锁,平均可以增加大约20%的数据包存活。
展开▼