首页> 外文期刊>Concurrency and Computation >A fault-tolerant routing strategy for k-ary n-direct s-indirect topologies based on intermediate nodes
【24h】

A fault-tolerant routing strategy for k-ary n-direct s-indirect topologies based on intermediate nodes

机译:基于中间节点的k元n-direct s-indirect拓扑的容错路由策略

获取原文
获取原文并翻译 | 示例
       

摘要

Exascale computing systems are being built with thousands of nodes. The high number of componentsrnof these systems significantly increases the probability of failure.Akey component for themrnis the interconnection network. If failures occur in the interconnection network, theymay isolaterna large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is neededrnto keep the system interconnected, even in the presence of faults. A recently proposed topologyrnfor these large systems is the hybrid k-ary n-direct s-indirect family that provides optimal performancernand connectivity at a reduced hardware cost. This paper presents a fault-tolerant routingrnmethodology for the k-ary n-direct s-indirect topology that degrades performance gracefully inrnpresence of faults and tolerates a large number of faults without disabling any healthy computingrnnode. In order to tolerate network failures, the methodology uses a simple mechanism. For anyrnsource-destination pair, if necessary, packets are forwarded to the destination node through a setrnof intermediate nodes (without being ejected from the network) with the aim of circumventingrnfaults. The evaluation results shows that the proposed methodology tolerates a large number ofrnfaults. For instance, it is able to toleratemore than 99.5% of fault combinations when there are 10rnfaults in a 3-D network with 1000 nodes using only 1 intermediate node and more than 99.98%rnif 2 intermediate nodes are used. Furthermore, the methodology offers a gracious performancerndegradation.Asan example, performance degrades onlyby1%for a2-D network with1024nodesrnand 1%faulty links.
机译:Exascale计算系统正在建立有数千个节点。这些系统中大量的组件大大增加了故障的可能性。互连网络的关键组件。如果互连网络中发生故障,则它们可能会隔离大部分机器。因此,即使在存在故障的情况下,也需要一种有效的容错机制来保持系统互连。这些大型系统最近提出的拓扑是混合k元n-直接s-间接家族,它以降低的硬件成本提供了最佳的性能和连接性。本文提出了一种用于k元n-直接s-间接拓扑的容错路由方法,该方法在不禁用任何健康计算节点的情况下,可以适度降低性能并容忍大量故障,并且可以容忍大量故障。为了容忍网络故障,该方法使用了一种简单的机制。对于任何源-目的地对,如有必要,都将通过setrnof中间节点(不从网络弹出)将数据包转发到目标节点,以规避故障。评估结果表明,所提出的方法可以容忍大量的故障。例如,当3D网络中有10个故障,而1000个节点仅使用1个中间节点而使用99.98%的故障(如果使用2个中间节点)时,它能够容忍99.5%以上的故障组合。此外,该方法还降低了性能。在一个例子中,对于具有1024个节点和1%故障链路的2-D网络,性能仅下降1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号