首页> 外文会议>IEEE International High Level Design Validation and Test Workshop >Cardio: Adaptive CMPs for reliability through dynamic introspective operation
【24h】

Cardio: Adaptive CMPs for reliability through dynamic introspective operation

机译:Cardio:通过动态内省操作可靠性的自适应CMPS

获取原文

摘要

Current technology scaling enables the integration of tens of processing elements into a single chip, and future technology nodes will soon allow the integration of hundreds of cores per device. While very powerful, many experts agree that these systems will be prone to a significant number of permanent and transient faults during their lifetime. If not properly handled, effects of runtime failures can be dramatic. In this work, we propose Cardio, a distributed architecture for reliable chip multiprocessors. Cardio, a novel approach for on-chip reliability is based on hardware detectors that spot failures and on software routines that reorganize the system to work around faulty components. Compared to previous online reliability solutions, Cardio provides failure reactivity comparable to hardware-only reliable solutions while requiring a much lower area overhead. Cardio operates a distributed resource manager to collect health information about components and leverages a robust distributed control mechanism to manage system-level recovery. Our architecture operational as long as at least one general purpose processor is still functional in the chip. We evaluated our design using a custom simulator and estimate its runtime impact on the SPECMPI benchmarks to be lower than 3%. We estimate its dynamic reconfiguration time to be comprised between 20 and 50 thousand cycles per failure.
机译:目前的技术缩放使数十个处理元件集成到单个芯片中,未来的技术节点将很快允许每台设备集成数百个核心。虽然非常强大,但许多专家认为这些系统将在终生期间容易出现大量的永久性和瞬态断层。如果没有正确处理,运行时失败的影响可能是戏剧性的。在这项工作中,我们提出了可靠芯片多处理器的Cardio,一种分布式架构。 Cardio,一种用于片上可靠性的新方法是基于消耗故障和重新组织系统的软件例程的硬件探测器,以重新组装故障组件。与以前的在线可靠性解决方案相比,Cardio提供与仅硬件可靠解决方案相当的故障反应性,同时需要更低的区域开销。 Cardio操作分布式资源管理器,以收集有关组件的健康信息,并利用强大的分布式控制机制来管理系统级恢复。我们的架构只要至少一个通用处理器仍然在芯片上运行。我们使用自定义模拟器评估了我们的设计,并估计其对Specmpi基准的运行时影响低于3%。我们估计其动态重新配置时间,以在每次发生20到50千周期之间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号