首页> 外文会议>Euro-par 2009 parallel processing >A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks
【24h】

A Multipath Fault-Tolerant Routing Method for High-Speed Interconnection Networks

机译:高速互连网络的多路径容错路由方法

获取原文
获取原文并翻译 | 示例

摘要

The intensive and continuous use of high-performance computers for executing computationally intensive applications, coupled with the large number of elements that make them up, dramatically increase the likelihood of failures during their operation.rnThe interconnection network is a critical part of high-performance computer systems that communicates and links together the processing units. Network faults have an extremely high impact because the occurrence of a single fault may prevent the correct finalization of applications.rnThis work focuses on the problem of fault tolerance for high-speed interconnection networks by designing a fault tolerant routing method. The goal is to solve a certain number of link and node failures, considering its impact, and occurrence probability. To accomplish this task we take advantage of communication path redundancy, by means of adaptive multipath routing approaches that fulfill the four phases of fault tolerance: error detection, damage confinement, error recovery, fault treatment and continuous service. Experiments show that our method allows applications to successfully finalize their execution in the presence of several number of faults, with an average performance value of 97% with respect to the fault-free scenarios.
机译:大量持续使用高性能计算机来执行计算密集型应用程序,再加上构成它们的大量元素,极大地增加了其运行过程中发生故障的可能性。互连网络是高性能计算机的关键部分与处理单元进行通信和链接的系统。网络故障具有极大的影响,因为单个故障的发生可能会阻止应用程序的正确完成。这项工作通过设计容错路由方法,着眼于高速互连网络的容错问题。考虑到其影响和发生概率,目标是解决一定数量的链接和节点故障。为完成此任务,我们利用自适应多径路由方法来利用通信路径冗余,这些方法可满足容错的四个阶段:错误检测,损坏限制,错误恢复,故障处理和连续服务。实验表明,我们的方法允许应用程序在存在多个故障的情况下成功完成其执行,相对于无故障方案而言,平均性能值为97%。

著录项

  • 来源
    《Euro-par 2009 parallel processing》|2009年|1078-1088|共11页
  • 会议地点 Delft(NL);Delft(NL)
  • 作者单位

    Computer Architecture and Operating Systems Department, University Autonoma of Barcelona, Spain;

    rnComputer Architecture and Operating Systems Department, University Autonoma of Barcelona, Spain;

    rnComputer Architecture and Operating Systems Department, University Autonoma of Barcelona, Spain;

    rnComputer Architecture and Operating Systems Department, University Autonoma of Barcelona, Spain;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 分布式操作系统、并行式操作系统;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号