首页> 外文会议>2010 9th IEEE International Symposium on Network Computing and Applications >Fault-Tolerant Deadlock-Free Adaptive Routing for Any Set of Link and Node Failures in Multi-cores Systems
【24h】

Fault-Tolerant Deadlock-Free Adaptive Routing for Any Set of Link and Node Failures in Multi-cores Systems

机译:多核系统中任何链路和节点故障集的容错无死锁自适应路由

获取原文

摘要

Future applications will require processors with many cores communicating through a regular interconnection network. Meanwhile, as the Deep submicron technology fore- shadows highly defective chips era, fault-tolerant designs become compulsory. In particular, the fault tolerance of a core interconnect is critical, and inevitably increases its complexity. In this paper, we present a novel adaptive routing algorithm that is able to route messages in the presence of any set of multiple nodes and links failures, as long as a path exists. Compared to the existing solutions, the proposed algorithm provides fault tolerance without using any routing table. It is scalable and can be applied to multicore chips with a 2D mesh core interconnect of any size. The algorithm is deadlock-free and avoids infinite looping in fault-free and faulty 2D meshes, based on Virtual Networks and Virtual Channels. We simulated the proposed algorithm using the worst case scenario, regarding the traffic patterns and the failure rate up to 40%. Experimentation results confirmed that the algorithm tolerates multiple failures even in the most extreme failure patterns. Additionally, we monitored the trade off between the fault tolerance and the average latency for faulty cases, as measurement of the performance degradation. The algorithm detects the interconnects partitioning and enables "preferred paths" for streaming applications.
机译:未来的应用将需要具有许多内核的处理器通过常规互连网络进行通信。同时,随着深亚微米技术预示着高缺陷芯片时代的到来,容错设计变得必不可少。特别地,核心互连的容错能力至关重要,并且不可避免地会增加其复杂性。在本文中,我们提出了一种新颖的自适应路由算法,该算法能够在存在任何多个节点和链接故障的情况下,只要存在路径,就可以路由消息。与现有解决方案相比,该算法无需使用任何路由表即可提供容错能力。它具有可扩展性,可以应用于具有任何尺寸的2D网状内核互连的多核芯片。该算法是无死锁的,并且可以基于虚拟网络和虚拟通道,在无故障和有故障的2D网格中避免无限循环。我们在最坏的情况下模拟了所提出的算法,该算法考虑了流量模式和高达40%的故障率。实验结果证实,即使在最极端的故障模式下,该算法也可以承受多种故障。此外,我们还监视了故障情况下的容错能力和平均等待时间之间的权衡,以此来衡量性能下降。该算法检测互连分区,并为流应用程序启用“首选路径”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号