首页> 外文OA文献 >Layered approach for runtime fault recovery in NOC-Based MPSOCS
【2h】

Layered approach for runtime fault recovery in NOC-Based MPSOCS

机译:基于NOC的MPSOCS中运行时故障恢复的分层方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Mechanisms for fault-tolerance in MPSoCs are mandatory to cope with defects during fabrication or faults during product lifetime. For instance, permanent faults on the interconnect network can stall or crash applications, even though the MPSoCs’ network has alternative faultfree paths to a given destination. Runtime Fault Tolerance provide self-organization mechanisms to continue delivering their processing services despite defective cores due to the presence of permanent and/or transient faults throughout their lifetime. This Thesis presents a runtime layered approach to a fault-tolerant MPSoC, where each layer is responsible for solving one part of the problem. The approach is built on top of a novel small specialized network used to search fault-free paths. The first layer, named physical layer, is responsible for the fault detection and fault isolation of defective routers. The second layer, named the network layer, is responsible for replacing the original faulty path by an alternative fault-free path. A fault-tolerant routing method executes a path search mechanism and reconfigures the network to use the faulty-free path. The third layer, named transport layer, implements a fault-tolerant communication protocol that triggers the path search in the network layer when a packet does not reach its destination. The last layer, application layer, is responsible for moving tasks from the defective processing element (PE) to a healthy PE, saving the task’s internal state, and restoring it in case of fault while executing a task. Results at the network layer, show a fast path finding method. The entire process of finding alternative paths takes typically less than 2000 clock cycles or 20 microseconds. In the transport layer, different approaches were evaluated being capable of detecting a lost message and start the retransmission. The results show that the overhead to retransmit the message is 2. 46X compared to the time to transmit a message without fault, being all other messages transmitted with no overhead. For the DTW, MPEG, and synthetic applications the average-case application execution overhead was 0. 17%, 0. 09%, and 0. 42%, respectively. This represents less than 5% of the application execution overhead worst case. At the application layer, the entire fault recovery protocol executes fast, with a low execution time overhead with no faults (5. 67%) and with faults (17. 33% - 28. 34%).
机译:MPSoC中的容错机制是强制性的,可应对制造过程中的缺陷或产品生命周期中的故障。例如,即使MPSoC的网络具有通向给定目的地的其他无故障路径,互连网络上的永久性故障也可能使应用程序停顿或崩溃。运行时容错提供了自组织机制,可以在核心存在缺陷的情况下继续提供其处理服务,尽管这些核心存在缺陷,因为它们在其整个生命周期中都存在永久性和/或瞬时性错误。本文提出了一种用于容错MPSoC的运行时分层方法,其中每一层负责解决问题的一部分。该方法建立在用于搜索无故障路径的新型小型专用网络之上。第一层称为物理层,负责故障路由器的故障检测和故障隔离。第二层称为网络层,负责用替代的无故障路径替换原始故障路径。容错路由方法执行路径搜索机制,并将网络重新配置为使用无故障路径。第三层称为传输层,实现了一种容错通信协议,当数据包未到达其目的地时,该协议会触发网络层中的路径搜索。最后一层是应用程序层,负责将任务从有缺陷的处理元素(PE)迁移到正常的PE,保存任务的内部状态,并在执行任务时在出现故障的情况下对其进行恢复。网络层的结果显示了一种快速的路径查找方法。查找替代路径的整个过程通常少于2000个时钟周期或20微秒。在传输层,评估了不同的方法,这些方法能够检测丢失的消息并开始重传。结果表明,重发消息的开销是2倍。与无故障发送消息的时间相比,重传消息的开销是其他所有消息的开销。对于DTW,MPEG和合成应用程序,平均情况下的应用程序执行开销分别为0. 17%,0。09%和0. 42%。这表示不到应用程序执行开销最坏情况的5%。在应用程序层,整个故障恢复协议执行速度快,执行时间开销低,没有故障(5. 67%),并且有故障(17. 33%-28. 34%)。

著录项

  • 作者

    Wächter Eduardo Weber;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 Português
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号