首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >Mitigating Critical Path Decompression Latency in Compressed L1 Data Caches Via Prefetching
【24h】

Mitigating Critical Path Decompression Latency in Compressed L1 Data Caches Via Prefetching

机译:通过预取减轻压缩L1数据缓存中的关键路径解压缩延迟

获取原文
获取外文期刊封面目录资料

摘要

Increasing the size of cache memory is a common approach for reducing miss rates and increasing performance in a CPU. Doing this, however, increases the static and dynamic energy consumption of the cache. Compression can be utilized to increase the effective capacity of cache memory without physically increasing its size. We can also use compression to reduce the physical size of the cache, and therefore reduce its energy consumption, while maintaining a reasonable effective cache capacity. Unfortunately, a decompression latency is experienced when accessing the compressed data. This affects the critical execution path of the processor and can have a significant impact on performance, especially when implemented in L1 cache. Previous work has used cache prefetching techniques to hide the latency of lower level memory accesses. Our work proposes the combination of data prefetching and compression techniques to reduce the impact of decompression latency and improve the feasibility of compression in L1 caches. We evaluate the performance of Last Outcome (LO), Stride (S), and Two-Level (2L) prefetching, as well as hybrid combinations of these methods (S/LO & 2L/S), in combination with Base-Delta-Immediate (B Δ I) compression. The results demonstrate that using B Δ I, in combination with data prefetching, provides performance improvement over BΔI compression alone in L1 data cache. We find that a 4KB Hybrid S/LO prefetcher results in an average speedup of 1.7% and improvement to the energy-delay product of the CPU by 1.5% versus B Δ I alone.
机译:增加高速缓冲存储器的大小是降低CPU中的错过率和提高性能的常用方法。但是,这样做,增加了缓存的静态和动态能耗。压缩可用于增加高速缓冲存储器的有效容量,而不会物理增加其尺寸。我们还可以使用压缩来减少缓存的物理大小,从而降低其能量消耗,同时保持合理的有效缓存容量。不幸的是,在访问压缩数据时经历了减压延迟。这会影响处理器的临界执行路径,并且可以对性能产生显着影响,尤其是在L1缓存中实现时。以前的工作已经使用缓存预取技术来隐藏较低级别存储器访问的延迟。我们的工作提出了数据预取和压缩技术的结合,以降低减压延迟的影响,提高L1缓存中压缩的可行性。我们评估最后一切(LO),步幅和两级(2L)预取的性能,以及这些方法的混合组合(S / LO&2L / s),与基础 - Δ相结合立即(bδi)压缩。结果表明,使用BΔI与数据预取结合,在L1数据高速缓存中单独提供对BδI压缩的性能改进。我们发现,4KB混合S / LO预取器导致平均加速1.7 %并改善CPU的能量延迟乘积1.5 %与BΔI单独。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号