Die-Stacked DRAM Caches for Servers Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

Djordje Jevdjic; Stavros Volos; Babak Falsafi

首页> 外文期刊>Computer architecture news >Die-Stacked DRAM Caches for Servers Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

【24h】

Die-Stacked DRAM Caches for Servers Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

机译：芯片堆叠式DRAM缓存是针对服务器的命中率，延迟还是带宽？拥有足迹缓存

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recent research advocates using large die-stacked DRAM caches to break the memory bandwidth wall. Existing DRAM cache designs fall into one of two categories — block-based and page-based. The former organize data in conventional blocks (e.g., 64B), ensuring low off-chip bandwidth utilization, but co-locate tags and data in the stacked DRAM, incurring high lookup latency. Furthermore, such designs suffer from low hit ratios due to poor temporal locality. In contrast, page-based caches, which manage data at larger granularity (e.g., 4KB pages), allow for reduced tag array overhead and fast lookup, and leverage high spatial locality at the cost of moving large amounts of data on and off the chip. This paper introduces Footprint Cache, an efficient die-stacked DRAM cache design for server processors. Footprint Cache allocates data at the granularity of pages, but identifies and fetches only those blocks within a page that will be touched during the page's residency in the cache — i.e., the page's footprint. In doing so, Footprint Cache eliminates the excessive off-chip traffic associated with page-based designs, while preserving their high hit ratio, small tag array overhead, and low lookup latency. Cycle-accurate simulation results of a 16-core server with up to 512MB Footprint Cache indicate a 57% performance improvement over a baseline chip without a die-stacked cache. Compared to a state-of-the-art block-based design, our design improves performance by 13% while reducing dynamic energy of stacked DRAM by 24%.

机译：最近的研究提倡者使用大型裸片堆叠DRAM缓存来打破内存带宽壁垒。现有的DRAM缓存设计属于两类之一-基于块和基于页面。前者将数据组织在常规块（例如64B）中，确保了较低的片外带宽利用率，但将标签和数据共置在堆叠的DRAM中，从而导致高查找延迟。此外，由于不良的时间局部性，这样的设计遭受低的命中率。相反，基于页面的高速缓存可以以较大的粒度（例如4KB页面）管理数据，从而可以减少标签阵列的开销和快速查找，并以将大量数据移入和移出芯片为代价来利用较高的空间局部性。。本文介绍了Footprint Cache，这是一种用于服务器处理器的高效管芯堆叠DRAM缓存设计。足迹缓存按页面的粒度分配数据，但仅识别和获取页面中在页面驻留在缓存中时将被触摸的那些块，即页面的足迹。这样，Footprint Cache消除了与基于页面的设计相关的过多片外流量，同时保留了它们的高命中率，小标签阵列开销和低查找延迟。具有多达512MB足迹缓存的16核服务器的精确周期仿真结果表明，与没有裸片堆栈缓存的基准芯片相比，性能提高了57％。与最新的基于块的设计相比，我们的设计将性能提高了13％，同时将堆叠DRAM的动态能量降低了24％。

著录项

来源
《Computer architecture news》 |2013年第3期|404-415|共12页
作者
Djordje Jevdjic; Stavros Volos; Babak Falsafi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Trade-off Between Hit Rate and Hit Latency for Optimizing DRAM Cache [J] . Chen Pai, Yue Jianhui, Liao Xiaofei, Emerging Topics in Computing, IEEE Transactions on . 2021,第1期

机译：击中率之间的权衡和击中延迟，以优化DRAM缓存
2. A 16 MB cache DRAM LSI with internal 35.8 GB/s memory bandwidth for simulataneous read and write operation [J] . Hideki Sakakibara, Michiaki Nakayama, Mitsugu Kusunoki, 電子情報通信学会技術研究報告. 集積回路. Integrated Circuits and Devices . 2000,第5期

机译：具有内部35.8 GB / s内存带宽的16 MB高速缓存DRAM LSI，用于同时进行读写操作
3. A 16 MB cache DRAM LSI with internal 35.8 GB/s memory bandwidth for simulataneous read and write operation [J] . Hideki Sakakibara, Michiaki Nakayama, Mitsugu Kusunoki, 電子情報通信学会技術研究報告. 集積回路. Integrated Circuits and Devices . 2000,第5期

机译：A 16 MB高速缓存 DRAM LSI 内部 35.8 GB / s的存储器带宽，用于同时读取和写入操作
4. A dual grain hit-miss detector for large Die-Stacked DRAM caches [C] . El-Nacouzi Michel, Atta Islam, Papadopoulou Myrto, Design, Automation Test in Europe Conference Exhibition;DATE 2013 . 2013

机译：双晶粒击中检测器，用于大型裸片堆叠DRAM高速缓存
5. Client-proxy-server model to support disconnected operations in wireless environments using semantic caching. [D] . Dunu, Catalin. 2001

机译：客户端代理服务器模型可使用语义缓存在无线环境中支持断开连接的操作。
6. In-DRAM Cache Management for Low Latency and Low Power 3D-Stacked DRAMs [O] . Ho Hyun Shin, Eui-Young Chung 2019

机译：用于低延迟和低功耗3D堆叠DRAM的DRAM中缓存管理
7. A Dual Grain Hit-Miss Detector for Large Die-Stacked DRAM Caches [O] . Michel El-nacouzi, Islam Atta, Myrto Papadopoulou, 2013

机译：用于大型芯片堆叠DRam缓存的双粒子命中检测器
8. Can High Bandwidth and Latency Justify Large Cache Blocks in ScalableMultiprocessors [R] . Bianchini, R., LeBlanc, T. J. 1994

机译：高带宽和延迟可以证明scalablemultiprocessors中的大缓存块

Die-Stacked DRAM Caches for Servers Hit Ratio, Latency, or Bandwidth? Have It All with Footprint Cache

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅