首页> 外文学位 >Prefetch mechanisms that acquire and exploit application specific knowledge.
【24h】

Prefetch mechanisms that acquire and exploit application specific knowledge.

机译:预取机制可获取和利用特定于应用程序的知识。

获取原文
获取原文并翻译 | 示例

摘要

The large number of cache misses of current applications coupled with the increasing cache miss latencies in current processor designs cause significant performance degradation, even in aggressive out-of-order processors. This dissertation explores two novel prefetching techniques to reduce I-cache and D-cache misses by acquiring and exploiting application specific knowledge.; The first part of this dissertation focuses on reducing I-cache misses of database systems using Call Graph Prefetching (CGP). CGP can be implemented either in software or in hardware. Both implementations are based on the insight that the sequence of function calls in a DBMS is highly predictable. CGP leverages this predictability by analyzing the call graph of a DBMS to prefetch a function that is likely to be called next. We evaluate the performance of CGP on sets of Wisconsin and TPC-H queries running on today's typical out-of-order superscalar processor models and show that CGP reduces the I-cache miss stall time by nearly 50%.; The second part of this dissertation addresses data prefetching. Initially we developed Ancestor Driven Prefetching (ADP). ADP builds ancestor graphs for frequently executed load/store instructions; it exploits the correlation between values defined by an ancestor of a group of memory instructions and the subsequent addresses they access to issue prefetches for the group. ADP suffered both accuracy and timeliness problems that limited its success. But the insights gained from ADP led us to precompute prefetch addresses instead of predicting them.; Dependence Graph Precomputation (DGP) uses a novel approach to data prefetching. Once an instruction is fetched from the I-cache into the Instruction Fetch Queue (IFQ), its dependences are determined and stored as pointers with the instruction in the IFQ. When a load/store instruction that is deemed likely to cause a cache miss enters the IFQ, a Dependence Graph Generator follows the dependence pointers still within the IFQ to generate the dependence graph of those instructions yet to be executed that will determine the address of the load/store instruction. A separate precomputation engine executes these graphs to generate the load/store addresses early enough for accurate prefetching. Our results show DGP reduces the D-cache miss stall time by 47%. Thus the techniques presented in this dissertation, CGP and DGP, take us about half way from an already highly tuned baseline system toward performance with perfect I-cache and perfect D-cache, respectively.
机译:当前应用程序的大量高速缓存未命中,再加上当前处理器设计中不断增加的高速缓存未命中延迟,即使在激进的乱序处理器中,也会导致性能显着下降。本文探索了两种新颖的预取技术,通过获取和利用特定于应用程序的知识来减少I-cache和D-cache丢失。本文的第一部分着重于使用调用图预取(CGP)减少数据库系统的I-cache丢失。 CGP可以通过软件或硬件来实现。两种实现都是基于这样的见解,即DBMS中的函数调用顺序是高度可预测的。 CGP通过分析DBMS的调用图来预取可能会被调用的函数,从而利用了这种可预测性。我们评估了在当今典型的无序超标量处理器模型上运行的威斯康星州和TPC-H查询集上CGP的性能,并表明CGP将I-cache失速停顿时间减少了近50%。本文的第二部分讨论数据的预取。最初,我们开发了祖先驱动的预取(ADP)。 ADP为频繁执行的加载/存储指令建立祖先图。它利用一组存储指令的祖先定义的值与其访问的后续地址之间的相关性,以访问该组指令以进行该组的预取。 ADP遇到准确性和及时性方面的问题,这限制了它的成功。但是从ADP获得的见解使我们能够预先计算预取地址,而不是预测它们。依赖图预计算(DGP)使用一种新颖的方法进行数据预取。将指令从I高速缓存中提取到指令提取队列(IFQ)中后,便会确定其相关性并将其与指针一起存储在IFQ中。当认为可能导致高速缓存未命中的加载/存储指令进入IFQ时,相关性图生成器将跟踪仍位于IFQ内的相关性指针,以生成尚未执行的那些指令的相关性图,这些指令将确定I / O的地址。加载/存储指令。单独的预计算引擎将执行这些图形,以尽早生成加载/存储地址以进行准确的预取。我们的结果表明,DGP将D缓存未命中停顿时间减少了47%。因此,本文介绍的技术CGP和DGP使我们从已经高度优化的基准系统到具有完美I缓存和完美D缓存的性能分别占了一半。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号