...
首页> 外文期刊>Journal of Electronic Testing: Theory and Applications: Theory and Applications >Exploring System Availability During Software-Based Self-Testing of Multi-core CPUs
【24h】

Exploring System Availability During Software-Based Self-Testing of Multi-core CPUs

机译:在基于软件的多核CPU的自我测试期间探索系统可用性

获取原文
获取原文并翻译 | 示例
           

摘要

Abstract As technology scales, the increased vulnerability of modern systems due to unreliable components becomes a major problem in the era of multi-/many-core architectures. Recently, several on-line testing techniques have been proposed, aiming towards error detection of wear-out/aging-related defects that can appear during the lifetime of a system. In this work, firstly we investigate the relation between system test latency and test-time overhead in multi-/many-core systems with shared Last-Level Cache (LLC) for periodic Software-Based Self-Testing (SBST), under different test scheduling policies. Secondly, we propose a new methodology aiming to reduce the extra overhead related to testing that is incurred as the system scales up (i.e., the number of on-chip cores increases). The investigated scheduling policies primarily vary the number of cores concurrently under test in the overall system test session. Our extensive, workload-driven dynamic exploration reveals that there is an inverse relationship between the two test measures; as the number of cores concurrently under test increases, system test latency decreases, but at the cost of significantly increased test time, which sacrifices system availability for the actual workloads. Under given system test latency constraints, which dictate the recovery time in the event of error detection, our exploration framework identifies the scheduling policy under which the overall test-time overhead is minimized and, hence, system availability is maximized. For the evaluation of the proposed techniques, multi-/many-core systems consisting of 16 and 64 cores are explored in a full-system, execution-driven simulation framework running multi-threaded PARSEC workloads.
机译:摘要随着技术规模,由于不可靠的组件由于不可靠的组件而增加的现代系统的脆弱性成为多/多核心架构的时代的主要问题。最近,已经提出了几种在线测试技术,针对在系统的寿命期间出现的耐磨/衰老相关缺陷的错误检测。在这项工作中,首先,我们研究了基于周期性软件的自我测试(SBST)的共享了最后一级缓存(LLC)的多/多核系统中系统测试延迟和测试时间开销之间的关系。安排策略。其次,我们提出了一种新的方法,旨在减少与系统缩放所产生的测试相关的额外开销(即,片上核心的数量增加)。调查的调度政策主要在整个系统测试会话中同时改变核心数。我们广泛的工作负载驱动的动态探讨显示,两种测试措施之间存在反比关系;随着核心的核心数量,系统测试延迟减少,但在显着增加的测试时间的成本下,牺牲了实际工作负载的系统可用性。在给定的系统测试延迟约束,在错误检测时决定恢复时间,我们的探索框架标识了调度策略,在该调度策略下,整个测试时间开销最小化,因此最大化系统可用性。为了评估所提出的技术,在运行多线程Parsec工作负载的全系统,执行驱动的仿真框架中探讨了由16和64个核心组成的多/多核系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号