首页> 外文期刊>Computer architecture news >Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance
【24h】

Architectural Core Salvaging in a Multi-Core Processor for Hard-Error Tolerance

机译:多核处理器中的架构内核抢救,以实现硬错误容忍

获取原文
获取原文并翻译 | 示例
           

摘要

The incidence of hard errors in CPUs is a challenge for future multicore designs due to increasing total core area. Even if the location and nature of hard errors are known a priori, either at manufacture-time or in the field, cores with such errors must be disabled in the absence of hard-error tolerance. While caches, with their regular and repetitive structures, are easily covered against hard errors by providing spare arrays or spare lines, structures within a core are neither as regular nor as repetitive. Previous work has proposed microarchitectural core salvaging to exploit structural redundancy within a core and maintain functionality in the presence of hard errors. Unfortunately microarchitectural salvaging introduces complexity and may provide only limited coverage of core area against hard errors due to a lack of natural redundancy in the core.rnThis paper makes a case for architectural core salvaging. We observe that even if some individual cores cannot execute certain operations, a CPU die can be instruction-set-architecture (ISA) compliant, that is execute all of the instructions required by its ISA, by exploiting natural cross-core redundancy. We propose using hardware to migrate offending threads to another core that can execute the operation. Architectural core salvaging can cover a large core area against faults, and be implemented by leveraging known techniques that minimize changes to the microarchitecture. We show it is possible to optimize architectural core salvaging such that the performance on a faulty die approaches that of a fault-free die-assuring significantly better performance than core disabling for many workloads and no worse performance than core disabling for the remainder.
机译:由于增加了总核心面积,CPU中的硬错误的发生对于未来的多核设计是一个挑战。即使在制造时或在现场先验地知道了硬错误的位置和性质,在没有硬错误容忍的情况下,必须禁用具有这种错误的磁芯。通过提供备用阵列或备用行,可以轻松地将具有常规和重复结构的缓存覆盖为硬错误,而内核中的结构既不常规也不具有重复性。先前的工作提出了微体系结构核心补救措施,以利用核心内部的结构冗余并在存在硬错误的情况下维护功能。不幸的是,微体系结构回收会带来复杂性,并且由于核心中缺乏自然冗余,可能仅对核心区域提供有限的覆盖,以防止出现硬错误。我们观察到,即使某些单个内核无法执行某些操作,CPU裸片也可以符合指令集架构(ISA),即通过利用自然的跨核冗余来执行其ISA所需的所有指令。我们建议使用硬件将有问题的线程迁移到可以执行该操作的另一个内核。架构核心修复可以覆盖较大的核心区域以防止故障,并且可以利用已知的技术来实现,从而最小化对微架构的更改。我们表明可以优化架构内核的修复,以使有故障的裸片上的性能接近无故障裸片的性能,从而确保在许多工作负载下,其性能都比禁用内核的性能好得多,而在其余工作上,其性能却不低于禁用内核的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号