Disjoint Out-of-Order Execution Processor

MAGEDA SHARAFEDDINE; KOMAL JOTHI; HAITHAM AKKARY

首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >Disjoint Out-of-Order Execution Processor

【24h】

Disjoint Out-of-Order Execution Processor

机译：脱节的无序执行处理器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era. We propose a new architecture that uses multiple small latency-tolerant out-of-order cores to improve single-thread performance. Improving single-thread performance with multiple small out-of-order cores allows designers to place more of these cores on the same die. Consequently, emerging highly parallel applications can take full advantage of the multicore parallel hardware without sacrificing performance of inherently serial and hard to parallelize applications. Our architecture combines speculative multithreading (SpMT) with checkpoint recovery and continual flow pipeline architectures. It splits single-thread program execution into disjoint control and data threads that execute concurrently on multiple cooperating small and latency-tolerant out-of-order cores. Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE uses latency tolerance to overcome performance issues of SpMT caused by interthread data dependences. To evaluate this architecture, we have developed a microarchitecture performance model of DOE based on PTLSim, a simulation infrastructure of the x86 instruction set architecture. We evaluate the potential performance of DOE processor architecture using a simple heuristic to fork control independent threads in hardware at the target addresses of future procedure return instructions. Using applications from SpecInt 2000, we study DOE under ideal as well as realistic architectural constraints. We discuss the performance impact of key DOE architecture and application variables such as number of cores, interthread data dependences, intercore data communication delay, buffers capacity, and branch mispredictions. Without any DOE specific compiler optimizations, our results show that DOE outperforms conventional SpMT architectures by 15%, on average. We also show that DOE with four small cores can perform on average equally well to a large superscalar core, consuming about the same power. Most importantly, DOE improves throughput performance by a significant amount over a large superscalar core, up to 2.5 times, when running multitasking applications.

机译：在多线程处理器时代，用于在单线程应用程序中利用指令级并行性的高性能超标量架构已变得过于复杂和耗电。我们提出了一种新的体系结构，该体系结构使用多个小的延迟容限的无序内核来提高单线程性能。通过使用多个小的乱序内核来提高单线程性能，设计人员可以将更多这些内核放置在同一芯片上。因此，新兴的高度并行的应用程序可以充分利用多核并行硬件的优势，而不会牺牲固有的串行性能和难以并行化的应用程序。我们的体系结构将推测性多线程（SpMT）与检查点恢复和连续流管道体系结构相结合。它将单线程程序执行拆分为不相交的控制线程和数据线程，这些线程在多个相互协作的小型且容忍延迟的无序内核上同时执行。因此，我们将这种执行方式称为脱机无序执行（DOE）。 DOE使用等待时间容限来克服由线程间数据依赖性引起的SpMT性能问题。为了评估该体系结构，我们基于PTLSim开发了DOE的微体系结构性能模型，该模型是x86指令集体系结构的模拟基础结构。我们使用简单的试探法在将来的程序返回指令的目标地址处分叉硬件中的独立控制线程，来评估DOE处理器体系结构的潜在性能。使用SpecInt 2000中的应用程序，我们研究了理想环境和实际建筑约束下的DOE。我们讨论了关键DOE体系结构和应用程序变量的性能影响，例如内核数，线程间数据依赖性，内核间数据通信延迟，缓冲区容量和分支错误预测。在没有任何特定于DOE的编译器优化的情况下，我们的结果表明，DOE的性能平均比传统SpMT架构高15％。我们还表明，具有四个小核的DOE可以平均地与大型超标量核表现相同，而消耗的功率却差不多。最重要的是，当运行多任务应用程序时，DOE在大型超标量内核上可显着提高吞吐量性能，最多可提高2.5倍。

著录项

来源
《ACM Transactions on Architecture and Code Optimization》 |2012年第3期|共32页
作者
MAGEDA SHARAFEDDINE; KOMAL JOTHI; HAITHAM AKKARY;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Design; Algorithms; Performance; Speculative multithreading; Checkpoint processors; Continual flow pipelines; Latency tolerant processors;

机译：设计;算法;性能;推测性多线程;检查点处理器;连续流管道;延迟容忍处理器;
入库时间 2022-08-18 09:49:41

相似文献

外文文献
中文文献
专利

1. Disjoint Out-of-Order Execution Processor [J] . MAGEDA SHARAFEDDINE, KOMAL JOTHI, HAITHAM AKKARY ACM Transactions on Architecture and Code Optimization . 2012,第3期

机译：脱节的无序执行处理器
2. An analysis of the performance impact of wrong-path memory references on out-of-order and runahead execution processors [J] . Mutlu O., Kim H., Armstrong D.N., IEEE Transactions on Computers . 2005,第12期

机译：错误路径内存引用对无序和超前执行处理器的性能影响的分析
3. Interrupt handling for out-of-order execution processors [J] . Torng H.C., Day M. IEEE Transactions on Computers . 1993,第1期

机译：乱序执行处理器的中断处理
4. A Formal Approach for Detecting Vulnerabilities to Transient Execution Attacks in Out-of-Order Processors [C] . Mohammad Rahmani Fadiheh, Johannes Müller, Raik Brinkmann, ACM/IEEE Design Automation Conference . 2020

机译：一种检测无序处理器中瞬态执行攻击漏洞的形式化方法
5. Systematic code partitioning for the disjoint-memory co-processor accelerated execution model. [D] . Mintz, Tiffany M. 2010

机译：不相干内存协处理器加速执行模型的系统代码分区。
6. Brain Networks Underlying Strategy Execution and Feedback Processing in an Efficient Functional Magnetic Resonance Imaging Neurofeedback Training Performed in a Parallel or a Serial Paradigm [O] . Wan Ilma Dewiputri, Renate Schweizer, Tibor Auer 2021

机译：大脑网络基础战略执行和反馈处理在一个平行或串行范式中执行的有效功能磁共振成像的神经融合训练
7. Efficient Methods for Out-of-Order Load/Store Execution for High-Performance Soft Processors [O] . Henry Wong, Vaughn Betz, Jonathan Rose 2016

机译：高性能软处理器无序加载/存储执行的有效方法

Disjoint Out-of-Order Execution Processor

摘要

著录项

相似文献

相关主题

期刊订阅