首页> 外文期刊>IEEE Transactions on Computers >A performance study of instruction cache prefetching methods
【24h】

A performance study of instruction cache prefetching methods

机译:指令缓存预取方法的性能研究

获取原文
获取原文并翻译 | 示例

摘要

Prefetching methods for instruction caches are studied via trace-driven simulation. The two primary methods are "fall-through" prefetch (sometimes referred to as "one block lookahead") and "target" prefetch. Fall-through prefetches are for sequential line accesses, and a key parameter is the distance from the end of the current line where the prefetch for the next line is initiated. Target prefetches work also for nonsequential line accesses. A prediction table is used and a key aspect is the prediction algorithm implemented by the table. Fall-through prefetch and target prefetch each improve performance significantly. When combined in a hybrid algorithm, their performance improvement is nearly additive. An instruction cache using a combined target and fall-through method can provide the same performance as a two to four times larger cache that does not prefetch. A good prediction method must not only be accurate, but prefetches must be initiated early enough to allow time for the instructions to return from main memory. To quantify this, we define a "prefetch efficiency" measure that reflects the amount of memory fetch delay that may be successfully hidden by prefetching. The better prefetch methods (in terms of miss rate) also have very high efficiencies, hiding approximately 90 percent of the miss delay for prefetched lines. Another performance measure of interest is memory traffic. Without prefetching, large line sizes give better hit rates; with prefetching, small line sizes tend to give better overall hit rates. Because smaller line sizes tend to reduce memory traffic, the top-performing prefetch caches produce less memory traffic than the top-performing nonprefetch caches of the same size.
机译:通过跟踪驱动仿真研究了指令缓存的预取方法。两种主要方法是“穿透式”预取(有时称为“一个块先行”)和“目标”预取。直通式预取用于顺序的行访问,关键参数是距当前行的末尾的距离,在该行中开始下一行的预取。目标预取对于非顺序行访问也起作用。使用预测表,关键方面是由该表实现的预测算法。穿透预取和目标预取均显着提高了性能。当结合使用混合算法时,它们的性能提高几乎是相加的。使用结合了目标和穿透方法的指令高速缓存可以提供与未预取的大两倍至四倍的高速缓存相同的性能。一个好的预测方法不仅必须是准确的,而且必须足够早地启动预取,以便有时间让指令从主存储器返回。为了对此进行量化,我们定义了一个“预取效率”度量,该度量反映了可以通过预取成功隐藏的内存取回延迟量。更好的预取方法(就未命中率而言)也具有很高的效率,对于预取的行而言,大约可隐藏90%的未命中延迟。另一个令人关注的性能指标是内存流量。如果不进行预取,则较大的行尺寸会带来更好的命中率;使用预取时,较小的行尺寸往往会带来更好的总体命中率。由于较小的行大小倾向于减少内存流量,因此性能最佳的预取缓存比相同大小的性能最佳的非预取缓​​存产生更少的内存流量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号