首页> 外文期刊>Computer architecture news >BEAR: Techniques for Mitigating Bandwidth Bloat in Gigascale DRAM Caches
【24h】

BEAR: Techniques for Mitigating Bandwidth Bloat in Gigascale DRAM Caches

机译:BEAR:减轻千兆级DRAM缓存中带宽膨胀的技术

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Die stacking memory technology can enable gigascale DRAM caches that can operate at 4x-8x higher bandwidth than commodity DRAM. Such caches can improve system performance by servicing data at a faster rate when the requested data is found in the cache, potentially increasing the memory bandwidth of the system by 4x-8x. Unfortunately, a DRAM cache uses the available memory bandwidth not only for data transfer on cache hits, but also for other secondary operations such as cache miss detection, fill on cache miss, and writeback lookup and content update on dirty evictions from the last-level on-chip cache. Ideally, we want the bandwidth consumed for such secondary operations to be negligible, and have almost all the bandwidth be available for transfer of useful data from the DRAM cache to the processor. We evaluate a 1GB DRAM cache, architected as Alloy Cache, and show that even the most bandwidth-efficient proposal for DRAM cache consumes 3.8x bandwidth compared to an idealized DRAM cache that does not consume any bandwidth for secondary operations. We also show that redesigning the DRAM cache to minimize the bandwidth consumed by secondary operations can potentially improve system performance by 22%. To that end, this paper proposes Bandwidth Efficient ARchitecture (BEAR) for DRAM caches. BEAR integrates three components, one each for reducing the bandwidth consumed by miss detection, miss fill, and writeback probes. BEAR reduces the bandwidth consumption of DRAM cache by 32%, which reduces cache hit latency by 24% and increases overall system performance by 10%. BEAR, with negligible overhead, outperforms an idealized SRAM Tag-Store design that incurs an unacceptable overhead of 64 megabytes, as well as Sector Cache designs that incur an SRAM storage overhead of 6 megabytes.
机译:芯片堆叠存储器技术可以实现千兆级DRAM高速缓存,该高速缓存的带宽可以比商用DRAM高4到8倍。当在高速缓存中找到请求的数据时,此类高速缓存可以通过以更快的速率为数据提供服务来提高系统性能,从而有可能将系统的内存带宽提高4到8倍。不幸的是,DRAM高速缓存不仅将可用内存带宽用于高速缓存命中时的数据传输,而且还用于其他辅助操作,例如高速缓存未命中检测,高速缓存未命中填充以及回写查找和内容更新(从最后一级逐出)片上缓存。理想情况下,我们希望这些二级操作消耗的带宽可以忽略不计,并且几乎所有带宽都可用于将有用数据从DRAM高速缓存传输到处理器。我们评估了构造为Alloy Cache的1GB DRAM缓存,并显示,即使是带宽效率最高的DRAM缓存建议也要消耗3.8倍的带宽,而理想化的DRAM缓存不消耗任何带宽用于二级操作。我们还表明,重新设计DRAM缓存以最大程度地减少次要操作消耗的带宽可以使系统性能提高22%。为此,本文提出了用于DRAM高速缓存的带宽有效架构(BEAR)。 BEAR集成了三个组件,每个组件用于减少未命中检测,未命中填充和回写探针所消耗的带宽。 BEAR将DRAM缓存的带宽消耗降低了32%,这将缓存命中延迟减少了24%,并将整个系统性能提高了10%。 BEAR的开销可忽略不计,其性能优于理想的SRAM Tag-Store设计,后者导致了64 MB的不可接受的开销,以及Sector Cache设计,其导致了6 MB的SRAM存储开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号