首页> 外文期刊>Emerging Topics in Computing, IEEE Transactions on >Reliability Aware Design and Lifetime Management of Computing Platforms
【24h】

Reliability Aware Design and Lifetime Management of Computing Platforms

机译:可靠性意识的计算平台的设计和寿命管理

获取原文
获取原文并翻译 | 示例

摘要

Meeting reliability targets with viable costs in the nanometer landscape become a significant challenge, requiring to be addressed in an unitary manner from design to run time. To this end, we propose a holistic reliability-aware design and lifetime management framework concerned (i) at design time, with providing a reliability enhanced adaptive architecture fabric, and (ii) at run time, with observing and dynamically managing fabric's wear-out profile such that user defined Quality-of-Service requirements are fulfilled, and with maintaining a full-life reliability log to be utilized as auxiliary information during the next IC generation design. After introducing our framework and the general philosophy behind it we delve into its key components. Specifically, we first introduce design time transistor and circuit level aging models, which provide the foundation for a 4-dimensional Design Space Exploration (DSE) meant to identify a reliability optimized circuit realization compliant with area, power, and delay constraints. Subsequently, to enable the creation of a low cost but yet accurate fabric observation infrastructure, we propose a methodology to minimize the number of aging sensors to be deployed in a circuit and identify their location, and introduce a sensor design able to directly capture circuit level amalgamated effects of concomitant degradation mechanisms. Furthermore, to make the information collected from sensors meaningful to the run-time management framework we introduce a circuit level model that can estimate the overall circuit aging and predict its End-of-Life based on imprecise sensors measurements, while taking into account the degradation nonlinearities. Finally, to provide more DSE reliability enhancement options we focus on the realization of reliable processing with unreliable components, and propose a methodology to obtain Error Correction Codes protected data processing units with an output error rate smaller than the fabrication technology gate error rate.
机译:满足纳米景观中可行成本的可靠性目标成为一项重大挑战,要求以单一的方式从设计进行运行时解决。为此,我们提出了一个全面的可靠性感知的设计和终身管理框架(i)在设计时,提供可靠性增强的自适应架构结构,并在运行时提供了(ii),观察和动态管理Fabric的磨损概况,使得满足用户定义的服务质量要求,并且在下一个IC生成设计期间维护全日用可靠性日志将用于辅助信息。在介绍我们的框架之后和它背后的一般哲学之后我们进入其关键组成部分。具体而言,我们首先介绍设计时晶体管和电路级老化模型,为4维设计空间探索(DSE)提供基础,意味着识别符合面积,电源和延迟约束的可靠性优化电路实现。随后,为了能够创建低成本但又准确的织物观察基础设施,我们提出了一种方法,以最大限度地减少在电路中部署的老化传感器的数量并识别其位置,并引入能够直接捕获电路电平的传感器设计伴随降解机制的合并影响。此外,为了使来自对运行时管理框架有意义的传感器收集的信息,我们引入了一种电路电平模型,可以估计整个电路老化,并根据不精确的传感器测量预测其寿命终端,同时考虑到劣化非线性。最后,提供更多的DSE可靠性增强选项,我们专注于利用不可靠的组件实现可靠处理,并提出一种方法来获得纠错码保护的数据处理单元,其输出误差率小于制造技术栅极误差率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号