首页> 外文学位 >Towards a cross-layer framework for wearout monitoring and mitigation.
【24h】

Towards a cross-layer framework for wearout monitoring and mitigation.

机译:迈向用于磨损监测和缓解的跨层框架。

获取原文
获取原文并翻译 | 示例

摘要

CMOS scaling has enabled greater degree of integration and higher performance but has the undesirable consequence of decreased circuit reliability due to rapid wearout. Accelerated processor wearout and the consequent degradation in lifetime have become a first order design constraint. This dissertation tackles these challenges by developing new tools to accurately quantify wearout, providing novel methods to quantify the wearout impact due to software interactions on hardware. The dissertation then demonstrates the usage of these tools by developing a wearout-aware scheduling approach that achieves wear leveling within a processor.;This dissertation first presents WearMon, an adaptive critical path monitoring architecture which provides accurate and real-time measure of a processor's wearout-induced timing margin degradation. Special test patterns are used to check a set of critical paths in the circuit-under-test. By activating the actual devices and signal paths used in normal operation of the chip, each test will capture up-to-date timing margin of these paths. This monitoring framework dynamically adapts testing interval and complexity based on analyses of prior test results, which increases efficiency and accuracy of monitoring. Monitoring overhead can be completely eliminated by scheduling tests only when the circuit is idle. This wearout detection mechanism is a key building block of a hierarchical runtime reliability management system where multiple wearout monitoring units can co-operatively engage preemptive error avoidance schemes. Our experimental results based on an FPGA implementation show that the proposed monitoring framework can be easily integrated into existing designs and operate with minimal overhead.;WearMon overhead can become a hurdle when a circuit block has a steep critical path timing wall. Many prior research studies intuitively argued that only a few paths within a steep critical path timing wall are actually utilized by application software. But there has been a dearth of tools that enable designers to understand how software impacts the utilization of critical paths in a circuit. The next part of this dissertation develops a tool for cross-layer analysis of wearout, called WAT. WAT uses FPGA emulation closely coupled with software simulation to provide accurate insight into device switching activity and runtime path utilization. We demonstrate the utility of WAT by providing accurate gate-level switching activity statistics as inputs to a lifetime wearout simulation tool. The switching activity statistics are used as inputs to the lifetime prediction tool which uses accurate device level models for the electrophysical phenomena causing wearout. Accurate switching statistics from WAT can significantly improve the lifetime prediction accuracy.;WAT is also used to address the concern regarding WearMon overhead in the presence of steep critical path timing walls. A new design-for-reliability approach is developed that reshapes a critical path wall to make a circuit more amenable for wearout monitoring. This design flow methodology uses path utilization profile to select only a few paths to be monitored for wearout. We propose and evaluate four novel algorithms for selecting paths to be monitored. These four approaches allow designers to select the best group of paths to be monitored under varying power, area and monitoring budget constraints.;Finally we demonstrate the impact of runtime wearout management in a proactive runtime wearout-aware scheduling approach, WAS. Processor failure can occur due to wearout of a single structure even if vast majority of the chip is still operational. WAS strives for uniform wearout of processor structures thereby preventing a single structure from becoming an early point of failure. The fine-grained microarchitectural level chip wearout control polices use feedback from a network of timing margin monitoring sensors to identify the most degraded structures. Our evaluation shows WAS can result in 15% to 30% improvement in lifespan of a multi-core processor chip with negligible performance and energy consumption impact.
机译:CMOS缩放可实现更高的集成度和更高的性能,但由于快速磨损而导致电路可靠性下降的不良后果。加速的处理器磨损和随之而来的寿命降低已成为一阶设计约束。本论文通过开发新的工具来准确地量化磨损,解决了这些挑战,提供了新颖的方法来量化由于软件与硬件的交互所引起的磨损影响。然后,本文通过开发一种可在处理器内实现磨损均衡的磨损感知调度方法,演示了这些工具的用法。本文首先介绍了WearMon,这是一种自适应关键路径监控体系结构,可提供实时,实时的处理器磨损度量。 -引起的时序裕度降低。特殊测试模式用于检查被测电路中的一组关键路径。通过激活芯片正常操作中使用的实际设备和信号路径,每次测试将捕获这些路径的最新时序裕量。该监视框架基于对先前测试结果的分析来动态适应测试间隔和复杂性,从而提高了监视的效率和准确性。仅在电路空闲时安排测试计划,才能完全消除监视开销。这种磨损检测机制是分层运行时可靠性管理系统的关键构建块,在该系统中,多个磨损监视单元可以合作采用抢先式错误避免方案。我们基于FPGA实现的实验结果表明,所提出的监控框架可以轻松集成到现有设计中,并以最小的开销运行。当电路模块具有陡峭的关键路径时序壁时,WearMon开销可能成为障碍。许多先前的研究直观地指出,陡峭的关键路径时序墙上只有很少的路径被应用软件实际利用。但是,缺少使设计人员能够了解软件如何影响电路关键路径利用率的工具。本文的下一部分将开发一种用于磨损的跨层分析的工具,称为WAT。 WAT将FPGA仿真与软件仿真紧密结合在一起,以准确了解设备切换活动和运行时路径利用率。我们通过提供准确的门级开关活动统计数据作为寿命损耗仿真工具的输入来证明WAT的实用性。开关活动统计数据用作寿命预测工具的输入,该寿命预测工具对导致磨损的电物理现象使用了精确的设备级模型。来自WAT的准确切换统计信息可以显着提高寿命预测的准确性。WAT还用于解决在存在关键路径时序陡峭壁的情况下有关WearMon开销的问题。开发了一种新的可靠性设计方法,该方法可重塑关键路径壁的形状,从而使电路更适合磨损监测。该设计流程方法使用路径利用率配置文件来选择仅几个路径以进行磨损监测。我们提出并评估了四种新颖的算法来选择要监视的路径。这四种方法使设计人员可以选择在变化的功率,面积和监视预算约束条件下要监视的最佳路径组。最后,我们在主动的运行时磨损感知调度方法WAS中演示了运行时磨损管理的影响。即使绝大多数芯片仍在运行,也可能由于单个结构的磨损而导致处理器故障。 WAS努力使处理器结构均匀磨损,从而防止单个结构成为早期故障点。细粒度的微体系结构级芯片磨损控制策略使用时序裕量监控传感器网络的反馈来识别最退化的结构。我们的评估表明,WAS可以使多核处理器芯片的使用寿命提高15%到30%,而对性能和能耗的影响却可以忽略不计。

著录项

  • 作者

    Zandian, Bardia.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 181 p.
  • 总页数 181
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号