首页> 外文会议>International Symposium on High-Performance Computing >Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor
【24h】

Exploiting Execution Locality with a Decoupled Kilo-Instruction Processor

机译:利用解耦千指令处理器利用执行局部性

获取原文
获取外文期刊封面目录资料

摘要

Overcoming increasing memory latency is one of the main problems that microprocessor designers have faced over the years. The two basic techniques introduced to mitigate latencies are caches and out-of-order execution. However, neither of these solutions is adequate-for hiding off-chip memory accesses in the order of 200 cycles or more. Theoretically, increasing the size of the instruction window would allow much longer latencies to be hidden. But scaling the structures to support thousands of in-flight instructions would be prohibitively expensive. However, the distribution of instruction issue times under the presence of L2 cache misses is highly correlated. This paper describes this phenomenon of Execution Locality and shows how it can be exploited with an inexpensive microarchitecture consisting of two linked cores. This Decoupled Kilo-Instruction Processor (D-KIP) is very effective in recovering lost potential performance. Extensive simulations show that speed-ups of up to 379% are possible for numerical benchmarks thanks to the exploitation of impressive degrees of Memory-Level Parallelism (MLP) and the execution of independent instructions in the shadow of L2 misses.
机译:克服越来越多的内存延迟是微处理器设计师在多年来面临的主要问题之一。引入减轻延迟的两个基本技术是缓存和无序执行。然而,这些解决方案都不足够了,用于覆盖200个周期数或更多的芯片存储器访问。从理论上讲,增加指令窗口的大小将允许隐藏更长的延迟。但缩放结构以支持数千个飞行指令将是昂贵的。然而,在L2缓存未命中的存在下指令问题的分布是高度相关的。本文介绍了执行局部性的这种现象,并展示了如何利用由两个连接核心的廉价的微体系结构。该解耦千指令处理器(D-KIP)在恢复损失的潜在性能方面非常有效。广泛的模拟表明,由于利用令人印象深刻的内存级并行度(MLP)和L2未命中的阴影中的独立指令,可以获得高达379%的加速度高达379%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号