MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

Yuya KORA; Kyohei YAMAGUCHI; Hideki ANDO

首页> 外文期刊>IEICE transactions on information and systems >MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

【24h】

MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

机译：超标量处理器中可感知MLP的动态指令窗口大小调整，可自适应地利用可用的并行性

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Single-thread performance has not improved much over the past few years, despite an ever increasing transistor budget. One of the reasons for this is that there is a speed gap between the processor and main memory, known as the memory wall. A promising method to overcome this memory wall is aggressive out-of-order execution by extensively enlarging the instruction window resources to exploit memory-level parallelism (MLP). However, simply enlarging the window resources lengthens the clock cycle time. Although pipelining the resources solves this problem, it in turn prevents instruction-level parallelism (ILP) from being exploited because issuing instructions requires multiple clock cycles. This paper proposed a dynamic scheme that adaptively resizes the instruction window based on the predicted available parallelism, either ILP or MLP. Specifically, if the scheme predicts that MLP is available during execution, the instruction window is enlarged and the window resources are pipelined, thereby exploiting MLP. Conversely, if the scheme predicts that less MLP is available, that is, ILP is exploitable for improved performance, the instruction window is shrunk and the window resources are de-pipelined, thereby exploiting ILP. Our evaluation results using the SPEC2006 benchmark programs show that the proposed scheme achieves nearly the best performance possible with fixed-size resources. On average, our scheme realizes a performance improvement of 21% over that of a conventional processor, with additional cost of only 6% of the area of the conventional processor core or 3% of that of the entire processor chip. The evaluation results also show 8% better energy efficiency in terms of 1/EDP (energy-delay product).

机译：尽管晶体管预算不断增加，但单线程性能在过去几年中并没有太大改善。原因之一是处理器与主内存（称为内存墙）之间存在速度差距。克服此内存壁的一种有前途的方法是通过广泛扩大指令窗口资源以利用内存级并行性（MLP）来主动执行乱序执行。但是，简单地增加窗口资源会延长时钟周期时间。尽管对资源进行流水处理解决了此问题，但由于发出指令需要多个时钟周期，因此反过来又阻止了指令级并行性（ILP）的利用。本文提出了一种动态方案，该方案根据预测的可用并行性（ILP或MLP）自适应地调整指令窗口的大小。具体而言，如果该方案预测在执行期间MLP可用，则将指令窗口扩大并且对窗口资源进行流水线处理，从而利用MLP。相反，如果该方案预测可用的MLP较少，即可以利用ILP来提高性能，则缩小指令窗口，并对窗口资源进行流水线处理，从而利用ILP。我们使用SPEC2006基准程序进行的评估结果表明，该方案在固定大小的资源下几乎可以达到最佳性能。平均而言，我们的方案实现了比常规处理器性能提高21％的性能，而成本仅为常规处理器内核面积的6％或整个处理器芯片面积的3％。评估结果还表明，以1 / EDP（能源延迟乘积）计算，能源效率提高了8％。

著录项

来源
《IEICE transactions on information and systems》 |2014年第12期|共14页
作者
Yuya KORA; Kyohei YAMAGUCHI; Hideki ANDO;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Efficient exploitation of instruction-level parallelism for superscalar processors by the conjugate register file scheme [J] . Meng-Chou Chang, Feipei Lai IEEE Transactions on Computers . 1996,第3期

机译：共轭寄存器文件方案对超标量处理器的指令级并行性的有效利用
2. Parallelism exploitation in superscalar multiprocessing [J] . Lu N.-P., Chung C.-P. IEE proceedings. Part E . 1998,第4期

机译：超标量多处理中的并行开发
3. Parallelism exploitation in superscalar multiprocessing [J] . N.-R Lu, C.-P Chung IEE proceedings. Part E . 1998,第4期

机译：超标量多处理中的并行开发
4. MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP [C] . Yuya Kora, Kyohei Yamaguchi, Hideki Ando Annual IEEE/ACM International Symposium on Microarchitecture . 2013

机译：可感知MLP的动态指令窗口调整大小，以自适应地利用ILP和MLP
5. Branch optimizations and instruction-level parallelism exploitation for dynamic superscalar and VLIW processors. [D] . Mantripragada, Srinivas. 2000

机译：动态超标量和VLIW处理器的分支优化和指令级并行性开发。
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. Improving Instruction Level Parallelism through Reconfigurable Units in Superscalar Processors [O] . Tameesh Suri 2008

机译：通过超标量处理器中的可重构单元提高指令级并行性
8. Exploiting Parallelism in Geometry Processing with General Purpose Processors and Floating-Point SIMD Instructions. [R] . Yang, C., Sano, B., Lebeck, A. R. 2005

机译：利用通用处理器和浮点sImD指令开发几何处理中的并行性。

MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism

摘要

著录项

相似文献

相关主题

期刊订阅