首页> 外文学位 >Cost-effective designs for supporting correct execution and scalable performance in many-core processors.

【24h】

Cost-effective designs for supporting correct execution and scalable performance in many-core processors.

机译：具有成本效益的设计，可支持多核处理器中的正确执行和可扩展的性能。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many-core processors offer new levels of on-chip performance by capitalizing on the increasing rate of device integration. Harnessing the full performance potential of these processors requires that hardware designers not only exploit the advantages, but also consider the problems introduced by the new architectures. Such challenges arise from both the processor's increased structural complexity and the reliability issues of the silicon substrate. In this thesis, we address these challenges in a framework that targets correct execution and performance on three coordinates: (1) tolerating permanent faults, (2) facilitating static and dynamic verification through precise specifications, and (3) designing scalable coherence protocols.;First, we propose CCA, a new design paradigm for increasing the processor's lifetime performance in the presence of permanent faults in cores. CCA chips rely on a reconfiguration mechanism that allows cores to replace faulty components with fault-free structures borrowed from neighboring cores. In contrast with existing solutions for handling hard faults that simply shut down cores, CCA aims to maximize the utilization of defect-free resources and increase the availability of on-chip cores. We implement three-core and four-core CCA chips and demonstrate that they offer a cumulative lifetime performance improvement of up to 65% for industry-representative utilization periods. In addition, we show that CCA benefits systems that employ modular redundancy to guarantee correct execution by increasing their availability.;Second, we target the correctness of the address translation system. Current processors often exhibit design bugs in their translation systems, and we believe one cause for these faults is a lack of precise specifications describing the interactions between address translation and the rest of the memory system, especially memory consistency. We address this aspect by introducing a framework for specifying translation-aware consistency models. As part of this framework, we identify the critical role played by address translation in supporting correct memory consistency implementations. Consequently, we propose a set of invariants that characterizes address translation. Based on these invariants, we develop DVAT, a dynamic verification mechanism for address translation. We demonstrate that DVAT is efficient in detecting translation-related faults, including several that mimic design bugs reported in processor errata. By checking the correctness of the address translation system, DVAT supports dynamic verification of translation-aware memory consistency.;Finally, we address the scalability of translation coherence protocols. Current software-based solutions for maintaining translation coherence adversely impact performance and do not scale. We propose UNITD, a hardware coherence protocol that supports scalable performance and architectural decoupling. UNITD integrates translation coherence within the regular cache coherence protocol, such that TLBs participate in the cache coherence protocol similar to instruction or data caches. We evaluate snooping and directory UNITD coherence protocols on processors with up to 16 cores and demonstrate that UNITD reduces the performance penalty of translation coherence to almost zero.

机译：许多核心处理器通过利用不断提高的设备集成率来提供更高级别的片上性能。充分利用这些处理器的全部性能潜能，不仅要求硬件设计人员充分利用这些优势，而且还要考虑新架构所带来的问题。这些挑战既来自处理器的结构复杂性增加，又来自硅基板的可靠性问题。在本文中，我们在一个针对三个坐标的正确执行和性能的框架中解决了这些挑战：（1）容忍永久性错误；（2）通过精确的规范促进静态和动态验证；（3）设计可扩展的一致性协议。首先，我们提出CCA，这是一种新的设计范例，用于在内核中存在永久性故障的情况下提高处理器的使用寿命。 CCA芯片依赖于一种重新配置机制，该机制允许内核用从相邻内核借来的无故障结构替换故障组件。与用于处理仅需关闭内核的硬故障的现有解决方案相比，CCA旨在最大程度地利用无缺陷资源并提高片上内核的可用性。我们实现了三核和四核CCA芯片，并证明它们在行业代表的使用期限内可提供高达65％的累积生命周期性能改进。此外，我们表明，CCA有利于采用模块化冗余的系统，以提高其可用性，从而确保正确执行。其次，我们针对地址转换系统的正确性。当前的处理器通常会在其转换系统中出现设计错误，我们认为造成这些故障的原因之一是缺乏精确的规范来描述地址转换与内存系统其余部分之间的相互作用，尤其是内存一致性。我们通过介绍用于指定翻译感知一致性模型的框架来解决这一方面。作为此框架的一部分，我们确定地址转换在支持正确的内存一致性实现中所起的关键作用。因此，我们提出了一组表征地址翻译的不变量。基于这些不变量，我们开发了DVAT，这是一种用于地址转换的动态验证机制。我们证明DVAT可有效检测与翻译相关的错误，其中包括一些模仿处理器勘误中报告的设计错误的错误。通过检查地址转换系统的正确性，DVAT支持对转换感知的内存一致性的动态验证。最后，我们解决了转换一致性协议的可伸缩性。当前用于维持翻译连贯性的基于软件的解决方案会对性能产生不利影响，并且无法扩展。我们提出了UNITD，这是一种硬件一致性协议，支持可伸缩的性能和体系结构去耦。 UNITD在常规缓存一致性协议中集成了翻译一致性，以便TLB参与类似于指令或数据缓存的缓存一致性协议。我们在多达16个内核的处理器上评估侦听和目录UNITD一致性协议，并证明UNITD将转换一致性的性能损失降低到几乎为零。

著录项

作者
Romanescu, Bogdan Florin.;
展开▼
作者单位

Duke University.;

展开▼
授予单位 Duke University.;
学科 Engineering Computer.
学位 Ph.D.
年度 2010
页码 150 p.
总页数 150
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Design and performance analysis of a cost-effective proxy-LMA mobility management scheme in IP-based mobile networks with global mobility support [J] . Cho Chulhee, Choi Jae-Young, Cho Jun-Dong, International journal of ad hoc and ubiquitous computing . 2016,第4期

机译：具有全球移动性支持的基于IP的移动网络中具有成本效益的Proxy-LMA移动性管理方案的设计和性能分析
2. Cost-effective and scalable rectifier design for multiple piezoelectric power sources with improved performance [J] . Zhang Yucheng, Bian Kan, Gu Yuhan, Journal of intelligent material systems and structures . 2020,第1期

机译：具有成本效益且可扩展的整流器设计，用于改进性能的多个压电电源
3. Exploiting Redundancy and Application Scalability for Cost-Effective, Time-Constrained Execution of HPC Applications on Amazon EC2 [J] . Aniruddha Marathe, Rachel Harris, David K. Lowenthal, IEEE Transactions on Parallel and Distributed Systems . 2016,第9期

机译：利用冗余和应用程序可伸缩性在Amazon EC2上以经济高效，受时间限制的HPC应用程序执行
4. Hierarchical Synchronization between Processes in a High-Performance Execution Support of Dataflow Process Networks on Many-Core Architectures [C] . Nguyen Dang Phuong, Nguyen Thanh Hai, Dubrulle Paul International Conference on Complex, Intelligent and Software Intensive Systems . 2014

机译：在多核体系结构上的数据流过程网络的高性能执行支持中，过程之间的分层同步
5. Scalable Compiler Optimizations for Improving the Memory System Performance in Multi- and Many-core Processors. [D] . Mehta, Sanyam. 2014

机译：可扩展的编译器优化，用于提高多核和多核处理器中的内存系统性能。
6. Challenges and Solutions in Optimizing Execution Performance of a Clinical Decision Support-Based Quality Measurement (CDS-QM) Framework [O] . Tyler J Tippetts, Phillip B Warner, Polina V Kukhareva, 2015

机译：基于临床决策支持的质量度量（CDS-QM）框架的执行性能优化中的挑战和解决方案
7. Scalable High-Performance Parallel Design for Network Intrusion Detection Systems on Many-Core Processors [O] . Jiang, Hayang, Xie, Gaogang, Salamatian, Kavé, 2013

机译：多核处理器上网络入侵检测系统的可扩展高性能并行设计

Cost-effective designs for supporting correct execution and scalable performance in many-core processors.

摘要

著录项

相似文献

相关主题

期刊订阅