首页> 外文会议> >DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors
【24h】

DStride: data-cache miss-address-based stride prefetching scheme for multimedia processors

机译:DStride:用于多媒体处理器的基于数据缓存的失误地址的跨步预取方案

获取原文
获取外文期刊封面目录资料

摘要

Prefetching reduces cache miss latency by moving data up in memory hierarchy before they are actually needed. Recent hardware-based stride prefetching techniques mostly rely on the processor pipeline information (e.g. program counter and branch prediction table) for prediction. Continuing developments in processor microarchitecture drastically change core pipeline design and require that existing hardware-based stride prefetching techniques be adapted to the evolving new processor architectures. In this paper we present a new hardware-based stride prefetching technique, called DStride, that is independent of processor pipeline design changes. In this new design, the first-level data cache miss address stream is used for the stride prediction. The miss addresses are separated into load stream and store stream to increase the efficiency of the predictor. They are checked separately against the recent miss address stream to detect the strides. The detected steady strides are maintained in a table that also performs look-ahead stride prefetching when the processor stride reference rate is higher than the prefetch request service rate. We evaluated our design with multimedia workloads using execution-driven simulation with SimpleScalar toolset. Our experiments show that DStride is very effective in reducing overall pipeline stalls due to cache miss latency, especially for stride-intensive applications such as multimedia workloads.
机译:预取通过在实际需要之前在内存层次结构中上移数据来减少高速缓存未命中延迟。最近的基于硬件的步幅预取技术主要依靠处理器管线信息(例如程序计数器和分支预测表)进行预测。处理器微体系结构的不断发展极大地改变了核心流水线设计,并要求现有的基于硬件的跨步预取技术适应不断发展的新处理器体系结构。在本文中,我们提出了一种新的基于硬件的步幅预取技术,称为DStride,它独立于处理器管线设计更改。在这种新设计中,第一级数据高速缓存未命中地址流用于步幅预测。未命中地址分为负载流和存储流,以提高预测器的效率。针对最近的未命中地址流分别检查它们,以检测跨步。在处理器步幅参考速率高于预取请求服务速率时,将检测到的稳定步幅保存在一个表中,该表还执行超前步幅预取。我们使用具有SimpleScalar工具集的执行驱动的仿真,通过多媒体工作负载评估了我们的设计。我们的实验表明,DStride在减少由于缓存未命中延迟而导致的总体流水线停滞方面非常有效,特别是对于跨步密集型应用(例如多媒体工作负载)而言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号