首页> 外文期刊>Dependable and Secure Computing, IEEE Transactions on >Hardware/Software Codesign Architecture for Online Testing in Chip Multiprocessors
【24h】

Hardware/Software Codesign Architecture for Online Testing in Chip Multiprocessors

机译:芯片多处理器在线测试的硬件/软件协同设计架构

获取原文
获取原文并翻译 | 示例
       

摘要

As the semiconductor industry continues its relentless push for nano-CMOS technologies, long-term device reliability and occurrence of hard errors have emerged as a major concern. Long-term device reliability includes parametric degradation that results in loss of performance as well as hard failures that result in loss of functionality. It has been reported in the ITRS roadmap that effectiveness of traditional burn-in test in product life acceleration is eroding. Thus, to assure sufficient product reliability, fault detection and system reconfiguration must be performed in the field at runtime. Although regular memory structures are protected against hard errors using error-correcting codes, many structures within cores are left unprotected. Several proposed online testing techniques either rely on concurrent testing or periodically check for correctness. These techniques are attractive, but limited due to significant design effort and hardware cost. Furthermore, lack of observability and controllability of microarchitectural states result in long latency, long test sequences, and large storage of golden patterns. In this paper, we propose a low-cost scheme for detecting and debugging hard errors with a fine granularity within cores and keeping the faulty cores functional, with potentially reduced capability and performance. The solution includes both hardware and runtime software based on codesigned virtual machine concept. It has the ability to detect, debug, and isolate hard errors in small noncache array structures, execution units, and combinational logic within cores. Hardware signature registers are used to capture the footprint of execution at the output of functional modules within the cores. A runtime layer of software (microvisor) initiates functional tests concurrently on multiple cores to capture the signature footprints across cores to detect, debug, and isolate hard errors. Results show that using targeted set of functional test sequences, faults can-n-n be debugged to a fine-granular level within cores. The hardware cost of the scheme is less than three percent, while the software tasks are performed at a high-level, resulting in a relatively low design effort and cost.
机译:随着半导体行业继续不懈地推动纳米CMOS技术的发展,长期的设备可靠性和硬错误的出现已成为主要问题。长期的设备可靠性包括导致性能下降的参数降级以及导致功能丧失的硬故障。据ITRS路线图报道,传统的老化测试在延长产品寿命方面的有效性正在受到侵蚀。因此,为了确保足够的产品可靠性,必须在运行时在现场执行故障检测和系统重新配置。尽管使用纠错码保护常规内存结构免受硬错误的侵害,但内核中的许多结构仍未得到保护。提出的几种在线测试技术要么依赖于并发测试,要么定期检查其正确性。这些技术很有吸引力,但是由于大量的设计工作和硬件成本而受到限制。此外,缺乏微体系结构状态的可观察性和可控制性导致较长的等待时间,较长的测试序列以及大量的金色图案存储。在本文中,我们提出了一种低成本方案,用于以内核内的精细粒度检测和调试硬错误,并使有故障的内核保持功能正常,并可能降低性能和性能。该解决方案包括基于代码签名虚拟机概念的硬件和运行时软件。它具有检测,调试和隔离小型非高速缓存阵列结构,执行单元以及内核内组合逻辑中的硬错误的能力。硬件签名寄存器用于捕获内核内功能模块输出处的执行足迹。软件(微管理程序)的运行时层同时在多个内核上启动功能测试,以捕获跨内核的签名足迹,以检测,调试和隔离硬错误。结果表明,使用目标功能测试序列集,可以在内核中将故障n-n-n调试到细粒度级别。该方案的硬件成本不到3%,而软件任务则以较高的级别执行,从而导致相对较低的设计工作量和成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号