首页> 外文会议>ACM/IEEE Annual International Symposium on Computer Architecture >Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies
【24h】

Criticality Aware Tiered Cache Hierarchy: A Fundamental Relook at Multi-Level Cache Hierarchies

机译:重要性感知层级缓存层次结构:多级缓存层次结构的基本面

获取原文

摘要

On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches are typically built as a multi-level cache hierarchy. One such popular hierarchy that has been adopted by modern microprocessors is the three level cache hierarchy. Building a three level cache hierarchy enables a low average hit latency since most requests are serviced from faster inner level caches. This has motivated recent microprocessors to deploy large level-2 (L2) caches that can help further reduce the average hit latency. In this paper, we do a fundamental analysis of the popular three level cache hierarchy and understand its performance delivery using program criticality. Through our detailed analysis we show that the current trend of increasing L2 cache sizes to reduce average hit latency is, in fact, an inefficient design choice. We instead propose Criticality Aware Tiered Cache Hierarchy (CATCH) that utilizes an accurate detection of program criticality in hardware and using a novel set of inter-cache prefetchers ensures that on-die data accesses that lie on the critical path of execution are served at the latency of the fastest level-1 (L1) cache. The last level cache (LLC) serves the purpose of reducing slow memory accesses, thereby making the large L2 cache redundant for most applications. The area saved by eliminating the L2 cache can then be used to create more efficient processor configurations. Our simulation results show that CATCH outperforms the three level cache hierarchy with a large 1MB L2 and exclusive LLC by an average of 8.4%, and a baseline with 256KB L2 and inclusive LLC by 10.3%. We also show that CATCH enables a powerful framework to explore broad chip-level area, performance and power trade-offs in cache hierarchy design. Supported by CATCH, we evaluate radical architecture directions such as eliminating the L2 altogether and show that such architectures can yield 4.5% performance gain over the baseline at nearly 30% lesser area or improve the performance by 7.3% at the same area while reducing energy consumption by 11%.
机译:片上高速缓存是一种有助于隐藏主内存延迟的流行方法。但是,很难在不大幅增加其访问延迟的情况下构建大型缓存,这反而会损害性能。为了克服这个困难,通常将管芯上的高速缓存构建为多级高速缓存层次结构。现代微处理器已采用的一种这样流行的层次结构是三级缓存层次结构。建立三级缓存层次结构可实现较低的平均命中延迟,因为大多数请求是从更快的内部级缓存中提供服务的。这促使最近的微处理器部署大型2级(L2)缓存,可以帮助进一步减少平均命中延迟。在本文中,我们对流行的三级缓存层次结构进行了基础分析,并了解了使用程序关键度的性能交付。通过我们的详细分析,我们发现增加L2缓存大小以减少平均命中延迟的当前趋势实际上是一种无效的设计选择。取而代之,我们提出了关键性感知分层缓存层次结构(CATCH),该方法利用对硬件中程序关键性的精确检测,并使用一套新颖的缓存间预取器,确保在关键执行路径上进行片上数据访问。最快的1级(L1)缓存的延迟。最后一级缓存(LLC)的目的是减少缓慢的内存访问,从而使大型L2缓存对于大多数应用程序而言都是冗余的。然后,通过消除L2缓存节省的区域可用于创建更高效​​的处理器配置。我们的模拟结果表明,CATCH的性能优于三级缓存层次结构,其中大型1MB L2和独占LLC的平均性能为8.4%,而基准值为256KB L2和内含LLC的性能为10.3%。我们还表明,CATCH支持强大的框架,可在高速缓存层次结构设计中探索广泛的芯片级面积,性能和功耗之间的取舍。在CATCH的支持下,我们评估了根本性的架构发展方向,例如完全消除了L2,并显示出这样的架构可以在面积减少近30%的情况下,比基线产生4.5%的性能提升,或者在相同的面积上提高7.3%的性能,而减少11%的能源消耗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号