Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor

机译：利用解耦千指令处理器利用执行局部性

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Overcoming increasing memory latency is one of the main problems that microprocessor designers have faced over the years. The two basic techniques introduced to mitigate latencies are caches and out-of-order execution. However, neither of these solutions is adequate-for hiding off-chip memory accesses in the order of 200 cycles or more. Theoretically, increasing the size of the instruction window would allow much longer latencies to be hidden. But scaling the structures to support thousands of in-flight instructions would be prohibitively expensive. However, the distribution of instruction issue times under the presence of L2 cache misses is highly correlated. This paper describes this phenomenon of Execution Locality and shows how it can be exploited with an inexpensive microarchitecture consisting of two linked cores. This Decoupled Kilo-Instruction Processor (D-KIP) is very effective in recovering lost potential performance. Extensive simulations show that speed-ups of up to 379% are possible for numerical benchmarks thanks to the exploitation of impressive degrees of Memory-Level Parallelism (MLP) and the execution of independent instructions in the shadow of L2 misses.

机译：克服越来越多的内存延迟是微处理器设计师在多年来面临的主要问题之一。引入减轻延迟的两个基本技术是缓存和无序执行。然而，这些解决方案都不足够了，用于覆盖200个周期数或更多的芯片存储器访问。从理论上讲，增加指令窗口的大小将允许隐藏更长的延迟。但缩放结构以支持数千个飞行指令将是昂贵的。然而，在L2缓存未命中的存在下指令问题的分布是高度相关的。本文介绍了执行局部性的这种现象，并展示了如何利用由两个连接核心的廉价的微体系结构。该解耦千指令处理器（D-KIP）在恢复损失的潜在性能方面非常有效。广泛的模拟表明，由于利用令人印象深刻的内存级并行度（MLP）和L2未命中的阴影中的独立指令，可以获得高达379％的加速度高达379％。

著录项

来源
《International Symposium on High-Performance Computing》|2008年||共12页
会议地点
作者
Miquel Pericas; Adrian Cristal; Ruben Gonzalez; Daniel A. Jimenez; Mateo Valero;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词

相似文献

外文文献
中文文献
专利

1. Decoupled Compressed Cache: Exploiting Spatial Locality for Energy Optimization [J] . Sardashti Somayeh, Wood David A. Micro, IEEE . 2014,第3期

机译：解耦的压缩缓存：利用空间局部性进行能源优化
2. Exploitation of Locality for Energy Efficiency for Breadth First Search in Fine-Grain Execution Models [J] . Chen Chen, Souad Koliai, Guang Gao 清华大学学报（英文版） . 2013,第006期

机译：细粒度执行模型中广度优先搜索的能源效率局部性开发
3. Exploiting the locality of instruction execution [J] . Takanori Hayashida, Kazuaki Murakami 電子情報通信学会技術研究報告. コンピュ-タシステム. Computer Systems . 2000,第248期

机译：利用指令执行的局部性
4. Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor [C] . Miquel Pericas, Adrian Cristal, Ruben Gonzalez, International Symposium on High-Performance Computing . 2008

机译：利用解耦千指令处理器利用执行局部性
5. Exploiting processing locality for adaptive computing systems. [D] . Taher, Mohamed Mahmoud Ahmed. 2006

机译：利用自适应计算系统的处理局部性。
6. A Novel Pre-Processing Technique for Original Feature Matrix of Electronic Nose Based on Supervised Locality Preserving Projections [O] . Pengfei Jia, Tailai Huang, Li Wang, 2016

机译：基于监督局部保留投影的电子鼻原始特征矩阵预处理新技术
7. Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor [O] . Miquel Pericàs, Ruben González, Daniela. Jiménez, 2010

机译：利用解耦的基洛指令处理器利用执行局部性
8. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution [R] . Beckman, P. , Crotinger, J. , Karmesin, S. , 1999

机译：smaRTs：通过垂直执行利用时间局部性和并行性

Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅