首页> 外文学位 >Cost-effective designs for supporting correct execution and scalable performance in many-core processors.
【24h】

Cost-effective designs for supporting correct execution and scalable performance in many-core processors.

机译:具有成本效益的设计,可支持多核处理器中的正确执行和可扩展的性能。

获取原文
获取原文并翻译 | 示例

摘要

Many-core processors offer new levels of on-chip performance by capitalizing on the increasing rate of device integration. Harnessing the full performance potential of these processors requires that hardware designers not only exploit the advantages, but also consider the problems introduced by the new architectures. Such challenges arise from both the processor's increased structural complexity and the reliability issues of the silicon substrate. In this thesis, we address these challenges in a framework that targets correct execution and performance on three coordinates: (1) tolerating permanent faults, (2) facilitating static and dynamic verification through precise specifications, and (3) designing scalable coherence protocols.;First, we propose CCA, a new design paradigm for increasing the processor's lifetime performance in the presence of permanent faults in cores. CCA chips rely on a reconfiguration mechanism that allows cores to replace faulty components with fault-free structures borrowed from neighboring cores. In contrast with existing solutions for handling hard faults that simply shut down cores, CCA aims to maximize the utilization of defect-free resources and increase the availability of on-chip cores. We implement three-core and four-core CCA chips and demonstrate that they offer a cumulative lifetime performance improvement of up to 65% for industry-representative utilization periods. In addition, we show that CCA benefits systems that employ modular redundancy to guarantee correct execution by increasing their availability.;Second, we target the correctness of the address translation system. Current processors often exhibit design bugs in their translation systems, and we believe one cause for these faults is a lack of precise specifications describing the interactions between address translation and the rest of the memory system, especially memory consistency. We address this aspect by introducing a framework for specifying translation-aware consistency models. As part of this framework, we identify the critical role played by address translation in supporting correct memory consistency implementations. Consequently, we propose a set of invariants that characterizes address translation. Based on these invariants, we develop DVAT, a dynamic verification mechanism for address translation. We demonstrate that DVAT is efficient in detecting translation-related faults, including several that mimic design bugs reported in processor errata. By checking the correctness of the address translation system, DVAT supports dynamic verification of translation-aware memory consistency.;Finally, we address the scalability of translation coherence protocols. Current software-based solutions for maintaining translation coherence adversely impact performance and do not scale. We propose UNITD, a hardware coherence protocol that supports scalable performance and architectural decoupling. UNITD integrates translation coherence within the regular cache coherence protocol, such that TLBs participate in the cache coherence protocol similar to instruction or data caches. We evaluate snooping and directory UNITD coherence protocols on processors with up to 16 cores and demonstrate that UNITD reduces the performance penalty of translation coherence to almost zero.
机译:许多核心处理器通过利用不断提高的设备集成率来提供更高级别的片上性能。充分利用这些处理器的全部性能潜能,不仅要求硬件设计人员充分利用这些优势,而且还要考虑新架构所带来的问题。这些挑战既来自处理器的结构复杂性增加,又来自硅基板的可靠性问题。在本文中,我们在一个针对三个坐标的正确执行和性能的框架中解决了这些挑战:(1)容忍永久性错误;(2)通过精确的规范促进静态和动态验证;(3)设计可扩展的一致性协议。首先,我们提出CCA,这是一种新的设计范例,用于在内核中存在永久性故障的情况下提高处理器的使用寿命。 CCA芯片依赖于一种重新配置机制,该机制允许内核用从相邻内核借来的无故障结构替换故障组件。与用于处理仅需关闭内核的硬故障的现有解决方案相比,CCA旨在最大程度地利用无缺陷资源并提高片上内核的可用性。我们实现了三核和四核CCA芯片,并证明它们在行业代表的使用期限内可提供高达65%的累积生命周期性能改进。此外,我们表明,CCA有利于采用模块化冗余的系统,以提高其可用性,从而确保正确执行。其次,我们针对地址转换系统的正确性。当前的处理器通常会在其转换系统中出现设计错误,我们认为造成这些故障的原因之一是缺乏精确的规范来描述地址转换与内存系统其余部分之间的相互作用,尤其是内存一致性。我们通过介绍用于指定翻译感知一致性模型的框架来解决这一方面。作为此框架的一部分,我们确定地址转换在支持正确的内存一致性实现中所起的关键作用。因此,我们提出了一组表征地址翻译的不变量。基于这些不变量,我们开发了DVAT,这是一种用于地址转换的动态验证机制。我们证明DVAT可有效检测与翻译相关的错误,其中包括一些模仿处理器勘误中报告的设计错误的错误。通过检查地址转换系统的正确性,DVAT支持对转换感知的内存一致性的动态验证。最后,我们解决了转换一致性协议的可伸缩性。当前用于维持翻译连贯性的基于软件的解决方案会对性能产生不利影响,并且无法扩展。我们提出了UNITD,这是一种硬件一致性协议,支持可伸缩的性能和体系结构去耦。 UNITD在常规缓存一致性协议中集成了翻译一致性,以便TLB参与类似于指令或数据缓存的缓存一致性协议。我们在多达16个内核的处理器上评估侦听和目录UNITD一致性协议,并证明UNITD将转换一致性的性能损失降低到几乎为零。

著录项

  • 作者

    Romanescu, Bogdan Florin.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Engineering Computer.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 150 p.
  • 总页数 150
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号