首页> 外文会议>ACM international conference on supercomputing >Locality Utility Co-optimization for Practical Capacity Management of Shared Last Level Caches
【24h】

Locality Utility Co-optimization for Practical Capacity Management of Shared Last Level Caches

机译:用于共享最后一级缓存的实际容量管理的地方和实用程序共同优化

获取原文

摘要

Shared last-level caches (SLLCs) on chip-multiprocessors play an important role in bridging the performance gap between processing cores and main memory. Although there are already many proposals targeted at overcoming the weaknesses of the least-recently-used (LRU) replacement policy by optimizing either locality or utility for heterogeneous workloads, very few of them are suitable for practical SLLO designs due to their large overhead of log associativity bits per cache line for re-reference interval prediction. The two recently proposed practical replacement policies, TA-DRRIP and SHiP, have significantly reduced the overhead by relying on just 2 bits per line for prediction, but they are oriented towards managing locality only, missing the opportunity provided by utility optimization. This paper is motivated by our two key experimental observations: (i) the not-recently-used (NRU) replacement policy that entails only one bit per line for prediction can satisfactorily approximate the LRU performance; (ii) since locality and utility optimization opportunities are concurrently present in heterogeneous workloads, the co-optimization of both would be indispensable to higher performance but is missing in existing practical SLLC schemes. Therefore, we propose a novel practical SLLC design, called GOOF', which needs just one bit per line for re-reference interval prediction, and leverages lightweight per-core locality & utility monitors that profile sample SLLC sets to guide the co-optimization. COOP offers significant throughput improvement over LRU by 7.67% on a quad-core CMP with a 4MB SLLC for 200 random workloads, outperforming both of the recent practical replacement policies at the in-between cost of 17.74KB storage overhead (TA-DRRIP: 4.53% performance improvement with 16KB storage cost; SHiP: 6.00%. performance improvement with 25.75KB storage overhead).
机译:芯片 - 多处理器上的共享最后级别缓存(SLLCS)在桥接处理核心和主存储器之间的性能差距中起着重要作用。虽然已经有许多提案,但是通过优化异构工作负载的局部性或效用来克服最近使用的(LRU)替换政策的许多提案,但由于它们的较大的日志开销,它们中的很少很少适合实用的SLLO设计每个缓存行的关联比特用于重新参考间隔预测。最近提出的两个实际替代政策,TA-DRRIP和船舶,通过依赖于每行仅为预测2位,但它们仅朝着管理局部管理,缺少储蓄优化提供的机会。本文的激励是我们的两个关键实验观察:(i)未最近使用的(NRU)替换政策,这些替换政策只需要每行一行以进行预测,可以令人满意地估计LRU性能; (ii)由于地区和公用事业优化机会同时存在于异构工作负载中,因此两者的共同优化将是更高的性能,但在现有的实用SLLC方案中缺少。因此,我们提出了一种新颖的实用SLLC设计,称为GOOF',其需要只需要每行一点进行再参考间隔预测,并利用轻量级的每核心位置和公用事业监视器,该配置文件样本SLLC集引导共同优化。 Cop在Quad-Core CMP上提供了7.67%的大量吞吐量,以4MB SLLC为200个随机工作负载,优于最近的实际替换政策,在17.74KB存储开销的成本之间进行了最近的实际替代政策(TA-DRRIP:4.53 16KB储存成本%的性能改进;船舶:6.00%。性能改善25.75KB的存储开销)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号