首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >Disjoint Out-of-Order Execution Processor
【24h】

Disjoint Out-of-Order Execution Processor

机译:脱节的无序执行处理器

获取原文
获取原文并翻译 | 示例
       

摘要

High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era. We propose a new architecture that uses multiple small latency-tolerant out-of-order cores to improve single-thread performance. Improving single-thread performance with multiple small out-of-order cores allows designers to place more of these cores on the same die. Consequently, emerging highly parallel applications can take full advantage of the multicore parallel hardware without sacrificing performance of inherently serial and hard to parallelize applications. Our architecture combines speculative multithreading (SpMT) with checkpoint recovery and continual flow pipeline architectures. It splits single-thread program execution into disjoint control and data threads that execute concurrently on multiple cooperating small and latency-tolerant out-of-order cores. Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE uses latency tolerance to overcome performance issues of SpMT caused by interthread data dependences. To evaluate this architecture, we have developed a microarchitecture performance model of DOE based on PTLSim, a simulation infrastructure of the x86 instruction set architecture. We evaluate the potential performance of DOE processor architecture using a simple heuristic to fork control independent threads in hardware at the target addresses of future procedure return instructions. Using applications from SpecInt 2000, we study DOE under ideal as well as realistic architectural constraints. We discuss the performance impact of key DOE architecture and application variables such as number of cores, interthread data dependences, intercore data communication delay, buffers capacity, and branch mispredictions. Without any DOE specific compiler optimizations, our results show that DOE outperforms conventional SpMT architectures by 15%, on average. We also show that DOE with four small cores can perform on average equally well to a large superscalar core, consuming about the same power. Most importantly, DOE improves throughput performance by a significant amount over a large superscalar core, up to 2.5 times, when running multitasking applications.
机译:在多线程处理器时代,用于在单线程应用程序中利用指令级并行性的高性能超标量架构已变得过于复杂和耗电。我们提出了一种新的体系结构,该体系结构使用多个小的延迟容限的无序内核来提高单线程性能。通过使用多个小的乱序内核来提高单线程性能,设计人员可以将更多这些内核放置在同一芯片上。因此,新兴的高度并行的应用程序可以充分利用多核并行硬件的优势,而不会牺牲固有的串行性能和难以并行化的应用程序。我们的体系结构将推测性多线程(SpMT)与检查点恢复和连续流管道体系结构相结合。它将单线程程序执行拆分为不相交的控制线程和数据线程,这些线程在多个相互协作的小型且容忍延迟的无序内核上同时执行。因此,我们将这种执行方式称为脱机无序执行(DOE)。 DOE使用等待时间容限来克服由线程间数据依赖性引起的SpMT性能问题。为了评估该体系结构,我们基于PTLSim开发了DOE的微体系结构性能模型,该模型是x86指令集体系结构的模拟基础结构。我们使用简单的试探法在将来的程序返回指令的目标地址处分叉硬件中的独立控制线程,来评估DOE处理器体系结构的潜在性能。使用SpecInt 2000中的应用程序,我们研究了理想环境和实际建筑约束下的DOE。我们讨论了关键DOE体系结构和应用程序变量的性能影响,例如内核数,线程间数据依赖性,内核间数据通信延迟,缓冲区容量和分支错误预测。在没有任何特定于DOE的编译器优化的情况下,我们的结果表明,DOE的性能平均比传统SpMT架构高15%。我们还表明,具有四个小核的DOE可以平均地与大型超标量核表现相同,而消耗的功率却差不多。最重要的是,当运行多任务应用程序时,DOE在大型超标量内核上可显着提高吞吐量性能,最多可提高2.5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号