首页> 外文会议>Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture >Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
【24h】

Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

机译:最少的多线程:在多线程处理器中查找和删除冗余指令

获取原文

摘要

Parallelism is the key to continued performance scaling in modern microprocessors. Yet we observe that this parallelism can often contain a surprising amount of instruction redundancy. We propose to exploit this redundancy to improve performance and decrease energy consumption. We propose a multi-threading micro-architecture, Minimal Multi-Threading (MMT), that leverages register renaming and the instruction window to combine the fetch and execution of identical instructions between threads in SPMD applications. While many techniques exploit intra-thread similarities by detecting when a later instruction may use an earlier result, MMT exploits inter-thread similarities by, whenever possible, fetching instructions from different threads together and only splitting them if the computation is unique. With two threads, our design achieves a speedup of 1.15(geometric mean) over a two-thread traditional SMT with a trace cache. With four threads, our design achieves a speedup of 1.25 (geometric mean) over a traditional SMT processor with four-threads and a trace cache. These correspond to speedups of 1.5 and 1.84 over a traditional out-of-order processor. Moreover, our performance increases inmost applications with no power increase because the increase in overhead is countered with a decrease in cache accesses, leading to a decrease in energy consumption for all applications.
机译:并行性是持续扩展现代微处理器性能的关键。但是我们观察到,这种并行性通常可能包含令人惊讶的指令冗余量。我们建议利用这种冗余来提高性能并减少能耗。我们提出了一种多线程微体系结构,即最小多线程(MMT),它利用寄存器重命名和指令窗口来组合SPMD应用程序中线程之间相同指令的获取和执行。尽管许多技术通过检测何时一条较晚的指令可以使用较早的结果来利用线程内的相似性,但是MMT尽可能地从不同线程中提取指令,并仅在计算唯一时才对它们进行拆分,从而利用线程间的相似性。通过两个线程,我们的设计比带有跟踪缓存的两个线程传统SMT的速度提高了1.15(几何平均值)。与具有四个线程和跟踪缓存的传统SMT处理器相比,我们的设计具有四个线程,可实现1.25(几何平均值)的加速。这些对应于传统乱序处理器的1.5和1.84的加速。此外,我们的性能在不增加功率的情况下提高了大多数应用程序的性能,因为开销的增加与高速缓存访​​问的减少相抵消,从而导致所有应用程序的能耗降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号