首页> 外文会议>ACM international conference on supercomputing >Locality Utility Co-optimization for Practical Capacity Management of Shared Last Level Caches
【24h】

Locality Utility Co-optimization for Practical Capacity Management of Shared Last Level Caches

机译:共享的最后一级缓存的实际容量管理的位置和实用程序协同优化

获取原文

摘要

Shared last-level caches (SLLCs) on chip-multiprocessors play an important role in bridging the performance gap between processing cores and main memory. Although there are already many proposals targeted at overcoming the weaknesses of the least-recently-used (LRU) replacement policy by optimizing either locality or utility for heterogeneous workloads, very few of them are suitable for practical SLLO designs due to their large overhead of log associativity bits per cache line for re-reference interval prediction. The two recently proposed practical replacement policies, TA-DRRIP and SHiP, have significantly reduced the overhead by relying on just 2 bits per line for prediction, but they are oriented towards managing locality only, missing the opportunity provided by utility optimization. This paper is motivated by our two key experimental observations: (i) the not-recently-used (NRU) replacement policy that entails only one bit per line for prediction can satisfactorily approximate the LRU performance; (ii) since locality and utility optimization opportunities are concurrently present in heterogeneous workloads, the co-optimization of both would be indispensable to higher performance but is missing in existing practical SLLC schemes. Therefore, we propose a novel practical SLLC design, called GOOF', which needs just one bit per line for re-reference interval prediction, and leverages lightweight per-core locality & utility monitors that profile sample SLLC sets to guide the co-optimization. COOP offers significant throughput improvement over LRU by 7.67% on a quad-core CMP with a 4MB SLLC for 200 random workloads, outperforming both of the recent practical replacement policies at the in-between cost of 17.74KB storage overhead (TA-DRRIP: 4.53% performance improvement with 16KB storage cost; SHiP: 6.00%. performance improvement with 25.75KB storage overhead).
机译:芯片多处理器上共享的最后一级缓存(SLLC)在弥合处理内核与主内存之间的性能差距方面发挥着重要作用。尽管已经有很多提案旨在通过针对异构工作负载优化位置或实用性来克服最近使用(LRU)替换策略的弱点,但是由于它们的日志开销很大,因此很少有适合于实际SLLO设计的提案。每条高速缓存行的关联位,用于重新参考间隔预测。最近提出的两种实用的替换策略TA-DRRIP和SHiP仅依靠每行2位进行预测,从而显着减少了开销,但它们仅针对管理位置,而错过了效用优化所提供的机会。本文的灵感来自于我们的两个主要实验观察结果:(i)最近使用的(NRU)替换策略(每行仅需要一位来进行预测)就可以令人满意地近似LRU性能; (ii)由于在异构工作负载中同时存在位置和实用程序优化机会,因此两者的共同优化对于更高的性能将是必不可少的,但在现有的实用SLLC方案中却缺少。因此,我们提出了一种新颖实用的SLLC设计,称为GOOF',每行仅需要一位即可进行重新参考间隔预测,并利用轻量级的每核局部性和实用程序监视器来配置样本SLLC设置,以指导共同优化。 COOP在具有4MB SLLC的四核CMP上可处理200个随机工作负载,与LRU相比,吞吐量显着提高了7.67%,以介于17.74KB的存储开销之间的中间成本,胜过了最近的两种实用替换策略(TA-DRRIP:4.53) %的性能提升,存储成本为16KB; SHiP:6.00%,性能提升,存储开销为25.75KB。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号