首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers
【24h】

Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

机译:反馈定向预取:提高硬件预取器的性能和带宽效率

获取原文

摘要

High performance processors employ hardware data prefetching to reduce the negative performance impact of large main memory latencies. While prefetching improves performance substantially on many programs, it can significantly reduce performance on others. Also, prefetching can significantly increase memory bandwidth requirements. This paper proposes a mechanism that incorporates dynamic feedback into the design of the prefetcher to increase the performance improvement provided by prefetching as well as to reduce the negative performance and bandwidth impact of prefetching. Our mechanism estimates prefetcher accuracy, prefetcher timeliness, and prefetcher-caused cache pollution to adjust the aggressiveness of the data prefetcher dynamically. We introduce a new method to track cache pollution caused by the prefetcher at run-time. We also introduce a mechanism that dynamically decides where in the LRU stack to insert the prefetched blocks in the cache based on the cache pollution caused by the prefetcher. Using the proposed dynamic mechanism improves average performance by 6.5% on 17 memory-intensive benchmarks in the SPEC CPU2000 suite compared to the best-performing conventional stream-based data prefetcher configuration, while it consumes 18.7% less memory bandwidth. Compared to a conventional stream-based data prefetcher configuration that consumes similar amount of memory bandwidth, feedback directed prefetching provides 13.6% higher performance. Our results show that feedback-directed prefetching eliminates the large negative performance impact incurred on some benchmarks due to prefetching, and it is applicable to stream-based prefetchers, global-history-buffer based delta correlation prefetchers, and PC-based stride prefetchers.
机译:高性能处理器采用硬件数据预取,以降低大主内存延迟的负面性能影响。虽然预取在许多程序上大大提高了性能,但它可以显着降低他人的性能。此外,预取可以显着提高内存带宽要求。本文提出了一种机制,将动态反馈结合到预取器的设计中,以提高通过预取提供的性能改进,以及降低预取的负性能和带宽影响。我们的机制估计预取器准确性,预取的时间和预取导致的高速缓存污染,以动态调整数据预取器的攻击性。我们介绍了一种新方法来跟踪运行时预取器引起的缓存污染。我们还介绍了一种动态地决定LRU堆栈中的机制,以基于预取器引起的高速缓存污染在高速缓存中插入预取块。使用所提出的动态机制在规范CPU2000套件中的17个内存密集型基准测试中提高了6.5%的平均性能,与最佳性能的基于流的数据预取配置配置相比,它消耗了18.7%的内存带宽。与消耗相似数量的内存带宽的传统流数据预取器配置相比,反馈定向预取提供了13.6%的性能。我们的结果表明,反馈导向预取可消除由于预取的某些基准导致的大量负面性能影响,并且适用于基于流的预取器,基于流的预取相关器和基于PC的步骤预取符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号