首页> 外文会议>2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) >Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching
【24h】

Exploiting set-level non-uniformity of capacity demand to enhance CMP cooperative caching

机译:利用容量需求的集合级不均匀性来增强CMP协作缓存

获取原文
获取原文并翻译 | 示例

摘要

As the Memory Wall remains a bottleneck for Chip Multiprocessors (CMP), the effective management of CMP last level caches becomes of paramount importance in minimizing expensive off-chip memory accesses. For the CMPs with private last level caches, Cooperative Caching (CC) has been proposed to enable capacity sharing among private caches by spilling an evicted block from one cache to another. But this eviction-driven CC does not necessarily promote the cache performance since it implicitly favors the applications full of block evictions regardless of their real capacity demand. The recent Dynamic Spill-Receive (DSR) paradigm improves CC by prioritizing applications with higher benefit from extra capacity in spilling blocks. However, the DSR paradigm only exploits the coarse-grained application-level difference in capacity demand, making it less effective as the non-uniformity exists at a much finer level. This paper (i) highlights the observation of cache set-level non-uniformity of capacity demand, and (ii) presents a novel L2 cache design, named SNUG (Set-level Non-Uniformity identifier and Grouper), to exploit the fine-grained non-uniformity to further enhance the effectiveness of cooperative caching. By utilizing a per-set shadow tag array and saturating counter, SNUG can identify whether a set should either spill or receive blocks; by using an index-bit flipping scheme, SNUG can group peer sets for spilling and receiving in an flexible way, capturing more opportunities for cooperative caching. We evaluate our design through extensive execution-driven simulations on Quad-core CMP systems. Our results show that for 6 classes of workload combinations our SNUG cache can improve the CMP throughput by up to 22.3%, with an average of 13.9% over the baseline configuration, while the state-of-the-art DSR scheme can only achieve an improvement by up to 14.5% and 8.4% on average.
机译:由于内存墙仍然是芯片多处理器(CMP)的瓶颈,因此对CMP末级高速缓存的有效管理对于最大程度地减少昂贵的片外内存访问至关重要。对于具有私有最后一级缓存的CMP,已提出了协作缓存(CC),以通过将逐出的块从一个缓存溢出到另一个缓存来实现私有缓存之间的容量共享。但是,这种驱逐驱动的CC不一定会提高缓存性能,因为它隐含地支持充满块驱逐的应用程序,而不管其实际容量需求如何。最新的动态溢出接收(DSR)范式通过优先处理应用程序而提高了CC,这是由于溢出块的额外容量所带来的好处。但是,DSR范式仅利用容量需求上的粗粒度应用程序级别差异,由于非均匀性存在于更精细的级别上,因此使其效率较低。本文(i)着重介绍了对缓存集级别的容量需求不一致性的观察,以及(ii)提出了一种名为SNUG(集级别的非一致性标识符和Grouper)的新型L2缓存设计,以利用精细的粒度不均匀,进一步增强了协作缓存的有效性。通过利用每个影子标签阵列和饱和计数器,SNUG可以识别出一个集是否应该溢出或接收块。通过使用索引位翻转方案,SNUG可以将对等集进行分组以灵活地进行溢出和接收,从而捕获了更多协作缓存的机会。我们通过在四核CMP系统上进行广泛的,执行驱动的仿真来评估我们的设计。我们的结果表明,对于6类工作负载组合,我们的SNUG缓存可以将CMP吞吐量提高22.3%,平均比基准配置高13.9%,而最新的DSR方案只能实现平均改善幅度高达14.5%和8.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号