首页> 外文期刊>Computer Architecture Letters >Revisiting Using the Results of Pre-Executed Instructions in Runahead Processors
【24h】

Revisiting Using the Results of Pre-Executed Instructions in Runahead Processors

机译:在Runahead处理器中重新使用预执行指令的结果

获取原文
获取原文并翻译 | 示例

摘要

Long-latency cache accesses cause significant performance-impacting delays for both in-order and out-of-order processor systems. To address these delays, runahead pre-execution has been shown to produce speedups by warming-up cache structures during stalls caused by long-latency memory accesses. While improving cache related performance, basic runahead approaches do not otherwise utilize results from accurately pre-executed instructions during normal operation. This simple model of execution is potentially inefficient and performance constraining. However, a previous study showed that exploiting the results of accurately pre-executed runahead instructions for out-of-order processors provide little performance improvement over simple re-execution. This work will show that, unlike out-of-order runahead architectures, the performance improvement from runahead result use for an in-order pipeline is more significant, on average, and in some situations provides dramatic performance improvements. For a set of SPEC CPU2006 benchmarks which experience performance improvement from basic runahead, the addition of result use to the pipeline provided an additional speedup of 1.14× (high − 1.48×) for an in-order processor model compared to only 1.05× (high − 1.16×) for an out-of-order one. When considering benchmarks with poor data cache locality, the average speedup increased to 1.21× for in-order compared to only 1.10× for out-of-order.
机译:对于有序和无序的处理器系统,长等待时间的高速缓存访​​问会导致明显的影响性能的延迟。为了解决这些延迟,已经证明,超前预执行可以通过在长延迟内存访问导致的停顿期间预热缓存结构来提高速度。在提高与缓存相关的性能的同时,基本的超前运行方法在正常操作期间不会利用准确地预先执行的指令的结果。这种简单的执行模型可能效率低下,并且会限制性能。但是,先前的研究表明,为无序处理器利用准确地预先执行的超前运行指令的结果,与简单的重新执行相比,性能几乎没有提高。这项工作将表明,与无序超前运行的体系结构不同,平均而言,超前运行结果在有序管道中的使用所带来的性能提升更为显着,并且在某些情况下可以显着提高性能。对于一组从基本超前性能得到改善的SPEC CPU2006基准测试,流水线中使用结果的增加为有序处理器模型提供了1.14倍(高− 1.48倍)的额外加速,而只有1.05倍(高)。 -1.16倍)表示乱序。当考虑数据缓存局部性较差的基准测试时,有序的平均速度提高到1.21倍,而无序的平均速度仅为1.10倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号