【24h】

Temporal instruction fetch streaming

机译:时间指令获取流媒体

获取原文

摘要

L1 instruction-cache misses pose a critical performance bottleneck in commercial server workloads. Cache access latency constraints preclude L1 instruction caches large enough to capture the application, library, and OS instruction working sets of these workloads. To cope with capacity constraints, researchers have proposed instruction prefetchers that use branch predictors to explore future control flow. However, such prefetchers suffer from several fundamental flaws: their lookahead is limited by branch prediction bandwidth, their accuracy suffers from geometrically-compounding branch misprediction probability, and they are ignorant of the cache contents, frequently predicting blocks already present in L1. Hence, L1 instruction misses remain a bottleneck. We propose Temporal Instruction Fetch Streaming (TIFS)—a mechanism for prefetching temporally-correlated instruction streams from lower-level caches. Rather than explore a program’s control flow graph, TIFS predicts future instruction-cache misses directly, through recording and replaying recurring L1 instruction miss sequences. In this paper, we first present an information-theoretic offline trace analysis of instruction-miss repetition to show that 94% of L1 instruction misses occur in long, recurring sequences. Then, we describe a practical mechanism to record these recurring sequences in the L2 cache and leverage them for instruction-cache prefetching. Our TIFS design requires less than 5% storage overhead over the baseline L2 cache and improves performance by 11% on average and 24% at best in a suite of commercial server workloads.
机译:L1指令 - 缓存未命令在商业服务器工作负载中构成关键性能瓶颈。缓存访问延迟约束防止L1指令缓存足以捕获这些工作负载的应用程序,库和OS指令工作集。为了应对容量限制,研究人员已经提出了使用分支预测因子来探索未来控制流程的指令预取。然而,这种预取人遭受了几个基本缺陷:它们的看法受分支预测带宽的限制,它们的准确性遭受几何复合分支错误规范概率,并且它们是对L1中已经存在的高速缓存内容的忽略剂,频繁预测块已经存在于L1中。因此,L1指令未命中仍然是一个瓶颈。我们提出了用于从低级高速缓存预取时间相关指令流的时间指令获取流(TIFS)-A机制。通过录制和重放重复的L1指令未命中序列,TIFS而不是探索程序的控制流程图,而不是直接预测未来的指令缓存未命中。在本文中,我们首先提出了一种信息 - 理论离线轨迹分析的指令 - 错过重复,以显示94%的L1指令未命中的序列发生。然后,我们描述了一种在L2缓存中记录这些重复序列的实际机制,并利用它们进行指令缓存预取。我们TIFS设计要求将比基线L2缓存小于5%的存储开销和一套商用服务器工作负载的平均提高11%和24%,最好提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号