首页> 外文学位 >Multiple clock domain microarchitecture design and analysis.
【24h】

Multiple clock domain microarchitecture design and analysis.

机译:多时钟域微架构的设计与分析。

获取原文
获取原文并翻译 | 示例

摘要

As clock frequency increases and feature size decreases, clock distribution and skew tolerance present growing challenges to the designers of singly-clocked, globally synchronous processors. We describe a globally-asynchronous, locally-synchronous (GALS) approach, which we call a Multiple Clock Domain ( MCD) processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains corresponding to the front end (including L1 instruction cache), integer units, floating-point units, and load-store units (including L1 data cache and unified L2 cache).; In addition, we quantify the potential energy savings of a specific MCD processor based on the Alpha 21264 microprocessor using off-line analysis of traces of a broad range of applications to identify the potential energy savings. With the results from this off-line algorithm as a benchmark, we describe the design, analysis and performance of a realistic on-line frequency/voltage control algorithm which achieves on average a 19.0% reduction in Energy Per Instruction (EPI), a 3.2% increase in Cycles Per Instruction (CPI), and a 16.7% improvement in the Energy-Delay product, with a Power Savings to Performance Degradation ratio of 4.6. This Energy-Delay product improvement is 85.5% of what was achieved using the off-line algorithm. All of our results (from both the off-line and online algorithms) were achieved using a broad mix of compute bound, memory bound, and rate-based applications from the MediaBench, Olden, and Spec2000 benchmark suites.; We also demonstrate that the inherent characteristics of an MCD microarchitecture allow internal processor complexity to be dynamically traded for frequency on a per-domain basis. Simply configuring the MCD processor once per application increases performance 17.6%, on average, compared to the best fully synchronous design. When adapting to application phases, performance improves by 20.4%.; These techniques provide an enabling technology which will allow future processor designs to achieve higher levels of scalability, performance, and energy efficiency than would otherwise be possible with a monolithic synchronous processor.
机译:随着时钟频率的增加和功能部件尺寸的减小,时钟分配和偏斜容限对单时钟全局同步处理器的设计人员构成了越来越大的挑战。我们描述了一种全局异步,本地同步(GALS)方法,我们将其称为多时钟域 MCD )处理器,其中该芯片分为多个时钟域,可以在其中执行独立的电压和频率缩放。选择域之间的边界来利用现有队列,从而最大程度地减少域间同步成本。我们提出了四个时钟域,分别对应于前端(包括L1指令高速缓存),整数单元,浮点单元和负载存储单元(包括L1数据高速缓存和统一L2高速缓存)。此外,我们通过对大量应用的痕迹进行离线分析来确定潜在的节能量,从而基于Alpha 21264微处理器对特定MCD处理器的潜在节能量进行量化。以该离线算法的结果为基准,我们描述了一种实际的在线频率/电压控制算法的设计,分析和性能,该算法平均可将每条指令的能量(EPI)降低19.0%,即3.2%每条指令的周期数(CPI)增长了%,能耗产品的能耗提高了16.7%,节能性能下降比为4.6。与离线算法相比,此Energy-Delay产品改进达到了85.5%。我们所有的结果(来自离线算法和在线算法)都是使用MediaBench,Olden和Spec2000基准测试套件中大量的计算范围,内存范围和基于速率的应用程序实现的。我们还证明了MCD微体系结构的固有特性允许内部处理器复杂性在每个域的基础上动态地交换频率。与最佳的完全同步设计相比,每个应用程序只需配置一次MCD处理器,性能平均提高17.6%。当适应应用程序阶段时,性能可提高20.4%。这些技术提供了一种使能技术,与单片同步处理器相比,未来的处理器设计可以实现更高水平的可扩展性,性能和能效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号