首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops >Mitigating Critical Path Decompression Latency in Compressed L1 Data Caches Via Prefetching
【24h】

Mitigating Critical Path Decompression Latency in Compressed L1 Data Caches Via Prefetching

机译:通过预取缓解压缩的L1数据缓存中的关键路径解压缩延迟

获取原文

摘要

Increasing the size of cache memory is a common approach for reducing miss rates and increasing performance in a CPU. Doing this, however, increases the static and dynamic energy consumption of the cache. Compression can be utilized to increase the effective capacity of cache memory without physically increasing its size. We can also use compression to reduce the physical size of the cache, and therefore reduce its energy consumption, while maintaining a reasonable effective cache capacity. Unfortunately, a decompression latency is experienced when accessing the compressed data. This affects the critical execution path of the processor and can have a significant impact on performance, especially when implemented in L1 cache. Previous work has used cache prefetching techniques to hide the latency of lower level memory accesses. Our work proposes the combination of data prefetching and compression techniques to reduce the impact of decompression latency and improve the feasibility of compression in L1 caches. We evaluate the performance of Last Outcome (LO), Stride (S), and Two-Level (2L) prefetching, as well as hybrid combinations of these methods (S/LO & 2L/S), in combination with Base-Delta-Immediate (B Δ I) compression. The results demonstrate that using B Δ I, in combination with data prefetching, provides performance improvement over BΔI compression alone in L1 data cache. We find that a 4KB Hybrid S/LO prefetcher results in an average speedup of 1.7% and improvement to the energy-delay product of the CPU by 1.5% versus B Δ I alone.
机译:增加高速缓存的大小是减少未命中率和提高CPU性能的一种常用方法。但是,这样做会增加缓存的静态和动态能耗。可以利用压缩来增加高速缓存的有效容量,而无需实际增加其大小。我们还可以使用压缩来减小缓存的物理大小,从而减少其能耗,同时保持合理的有效缓存容量。不幸的是,访问压缩数据时会遇到解压缩延迟。这会影响处理器的关键执行路径,并且可能会对性能产生重大影响,尤其是在L1高速缓存中实施时。先前的工作使用缓存预取技术来隐藏较低级别的内存访问的延迟。我们的工作提出了数据预取和压缩技术的结合,以减少解压缩延迟的影响并提高L1缓存中压缩的可行性。我们评估了最后结果(LO),步幅(S)和两级(2L)预取的性能,以及这些方法(S / LO和2L / S)的混合组合以及Base-Delta-立即压缩(BΔI)。结果表明,结合使用BΔI和数据预取,可以在L1数据高速缓存中提供比单独的BΔI压缩更高的性能。我们发现,与单独的BΔI相比,4KB混合S / LO预取器的平均速度提高了1.7%,CPU的能量延迟积提高了1.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号