首页> 外文会议>IEEE/ACM International Symposium on Microarchitecture >Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
【24h】

Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

机译:有效地启用常规块大小以用于非常大的裸片堆叠DRAM缓存

获取原文

摘要

Die-stacking technology enables multiple layers of DRAM to be integrated with multicore processors. A promising use of stacked DRAM is as a cache, since its capacity is insufficient to be all of main memory (for all but some embedded systems). However, a 1GB DRAM cache with 64-byte blocks requires 96MB of tag storage. Placing these tags on-chip is impractical (larger than on-chip L3s) while putting them in DRAM is slow (two full DRAM accesses for tag and data). Larger blocks and sub-blocking are possible, but less robust due to fragmentation. This work efficiently enables conventional block sizes for very large die-stacked DRAM caches with two innovations. First, we make hits faster than just storing tags in stacked DRAM by scheduling the tag and data accesses as a compound access so the data access is always a row buffer hit. Second, we make misses faster with a MissMap that eschews stacked-DRAM access on all misses. Like extreme sub-blocking, our implementation of the MissMap stores a vector of block-valid bits for each “page” in the DRAM cache. Unlike conventional sub-blocking, the MissMap (a) points to many more pages than can be stored in the DRAM cache (making the effects of fragmentation rare) and (b) does not point to the “way” that holds a block (but defers to the off-chip tags). For the evaluated large-footprint commercial workloads, the proposed cache organization delivers 92.9% of the performance benefit of an ideal 1GB DRAM cache with an impractical 96MB on-chip SRAM tag array.
机译:芯片堆叠技术使多层DRAM与多核处理器集成在一起。堆叠式DRAM有希望的用途是用作缓存,因为它的容量不足以容纳所有主存储器(除了某些嵌入式系统以外的所有存储器)。但是,具有64字节块的1GB DRAM高速缓存需要96MB的标签存储空间。将这些标签放置在芯片上是不切实际的(大于芯片上的L3),而将它们放入DRAM的速度却很慢(两个完整的DRAM访问标签和数据)。更大的块和子块是可能的,但是由于碎片而导致鲁棒性降低。通过两项创新,这项工作有效地实现了用于非常大的裸片堆叠DRAM高速缓存的常规块大小。首先,通过将标签和数据访问安排为复合访问,使命中比仅将标签存储在堆叠DRAM中更快,因此数据访问始终是行缓冲区命中。其次,我们使用MissMap避免了所有未命中的堆栈式DRAM访问,从而更快地使未命中。像极端子块一样,我们对MissMap的实现在DRAM缓存中为每个“页面”存储了一个块有效位向量。与传统的子块不同,MissMap(a)指向的页数比DRAM缓存中存储的页数多(使得碎片效应很少见),并且(b)并不指向保存块的“方式”(但顺应片外标签)。对于经过评估的大型商业负载,拟议的缓存组织可提供理想的1GB DRAM缓存和不可行的96MB片上SRAM标签阵列的92.9%的性能优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号