首页> 外文会议>Design, Automation Test in Europe Conference Exhibition;DATE 2013 >Communication and migration energy aware design space exploration for multicore systems with intermittent faults
【24h】

Communication and migration energy aware design space exploration for multicore systems with intermittent faults

机译:具有间歇性故障的多核系统的通信和迁移能量感知设计空间探索

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Shrinking transistor geometries, aggressive voltage scaling and higher operating frequencies have negatively impacted the dependability of embedded multicore systems. Most existing research works on fault-tolerance have focused on transient and permanent faults of cores. Intermittent faults are a separate class of defects resulting from on-chip temperature, pressure and voltage variations and lasting for a few cycles to several seconds or more. Operations of cores impacted by intermittent faults are suspended during these cycles but come back alive when conditions become favorable. This paper proposes a technique to model the availability of multiprocessor systems-on-chip (MPSoCs) with intermittent and reparable device defects. This model is based on Markov chain with stochastic fault distribution and can be applied even for permanent faults. Based on this model, a design space pruning technique is proposed to select a set of task mappings (with variable resource usage), which minimizes the task communication energy while satisfying the MPSoC availability constraint. Moreover, task migration overhead is also minimized, which is an important consideration for frequently occurring intermittent and temperature related faults, where prolonged system downtime during task re-mapping is not desired. Experiments conducted with real-life and synthetic application task graphs demonstrate that the proposed technique minimizes communication energy by 30% and reduces migration overhead by 50% as compared to the existing approaches.
机译:缩小的晶体管几何尺寸,激进的电压缩放和更高的工作频率对嵌入式多核系统的可靠性产生了负面影响。现有的大多数关于容错的研究都集中在铁心的瞬时和永久性故障上。间歇性故障是由于芯片上的温度,压力和电压变化而导致的单独缺陷类别,并持续数个周期至数秒或更长时间。在这些周期中,受间歇性故障影响的磁芯的运行会暂停,但在情况有利时会恢复运行。本文提出了一种对具有间歇性和可修复器件缺陷的多处理器片上系统(MPSoC)的可用性进行建模的技术。该模型基于具有随机故障分布的马尔可夫链,甚至可以用于永久性故障。基于该模型,提出了一种设计空间修剪技术来选择一组任务映射(具有可变的资源使用),从而在满足MPSoC可用性约束的同时最大程度地减少任务通信的能量。此外,任务迁移开销也被最小化,这是频繁发生的间歇性故障和与温度相关的故障的重要考虑因素,在这种情况下,不需要在重新映射任务期间延长系统停机时间。使用现实生活和合成应用任务图进行的实验表明,与现有方法相比,该技术可将通信能量最小化30%,并将迁移开销减少50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号