首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Software-based rerouting for fault-tolerant pipelined communication
【24h】

Software-based rerouting for fault-tolerant pipelined communication

机译:基于软件的重新路由以实现容错流水线通信

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a software-based approach to fault-tolerant routing in networks using wormhole or virtual cut-through switching. When a message encounters a faulty output link, it is removed from the network by the local router and delivered to the messaging layer of the local node's operating system. The message passing software can reroute this message, possibly along nonminimal paths. Alternatively, the message may be addressed to an intermediate node, which will forward the message to the destination. A message may encounter multiple faults and pass through multiple intermediate nodes. The proposed techniques are applicable to both obliviously and adaptively routed networks. The techniques are specifically targeted toward commercial multiprocessors where the mean time to repair (MTTR) is much smaller than the mean time between router failures (MTBF), i.e., it is sufficient to tolerate a maximum of three failures. This paper presents requirements for buffer management, deadlock freedom, and livelock freedom. Simulation results are presented to evaluate the degradation in latency and throughput as a function of the number and distribution of faults. There are several advantages of such an approach. Router designs are minimally impacted, and thus remain compact and fast. Only messages that encounter faulty components are affected, while the machine is ensured of continued operation until the faulty components can be replaced. The technique leverages existing network technology, and the concepts are portable across evolving switch and router designs. Therefore, we feel that the technique is a good candidate for incorporation into the next generation of multiprocessor networks.
机译:本文提出了一种基于软件的方法,用于使用虫洞或虚拟直通交换的网络中的容错路由。当消息遇到错误的输出链接时,它会被本地路由器从网络中删除,并传递到本地节点操作系统的消息传递层。消息传递软件可以重新路由此消息,可能沿着非最小路径。替代地,可以将消息寻址到中间节点,该中间节点将消息转发到目的地。一条消息可能会遇到多个故障并通过多个中间节点。所提出的技术适用于遗忘和自适应路由网络。该技术专门针对商业多处理器,其中平均修复时间(MTTR)比路由器故障之间的平均时间(MTBF)小得多,即足以承受最多三个故障。本文提出了缓冲区管理,死锁自由和活锁自由的要求。给出了仿真结果,以评估作为故障数量和分布的函数的延迟和吞吐量的下降。这种方法有几个优点。路由器设计受到的影响最小,因此保持紧凑和快速。只有遇到故障组件的消息才会受到影响,同时确保机器可以继续运行,直到可以更换故障组件为止。该技术利用了现有的网络技术,并且这些概念可在不断发展的交换机和路由器设计中移植。因此,我们认为该技术是整合到下一代多处理器网络中的不错选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号