首页> 外文期刊>Computer architecture news >Towards Hybrid Last Level Caches for Chip-Multiprocessors
【24h】

Towards Hybrid Last Level Caches for Chip-Multiprocessors

机译:面向芯片多处理器的混合末级缓存

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

As CMP platforms are widely adopted, more and more cores are integrated on to the die. To reduce the off-chip memory access, the last level cache is usually organized as a distributed shared cache. In order to avoid hot-spots, cache lines are interleaved across the distributed shared cache slices using a hash function. However, as we increase the number of cores and cache slices in the platform, this also implies that most of data references go to remote cache slices, thereby increasing the access latency significantly. In this paper, we propose a hybrid last level cache, which has some amount of private space and some amount of shared space on each cache slice. For workloads with no sharing, the goal is to provide more hits into the local slice while still keeping the overall miss rate low. For workloads with sufficient sharing, the goal is to allow more sharing in the last-level cache slice. We present hybrid last-level cache design options and study its hit/miss rate behavior for a number of important server applications and multi-programmed workloads. Our simulation results on running multi-programmed workloads based on SPEC CINT2000 as well as multithreaded workloads based on commercial server benchmarks (TPCC, SPECjbb, SAP and TPCE) show that this architecture is advantageous especially since it can improve the local hit rate significantly while keeping the overall miss rate similar to the shared cache.
机译:随着CMP平台的广泛采用,越来越多的内核集成到芯片上。为了减少片外存储器访问,通常将最后一级缓存组织为分布式共享缓存。为了避免出现热点,使用散列函数将高速缓存行交错分布在分布式共享高速缓存片上。但是,随着平台中内核和缓存片数量的增加,这也意味着大多数数据引用都将访问远程缓存片,从而显着增加了访问延迟。在本文中,我们提出了混合的最后一级缓存,该缓存在每个缓存片上具有一定数量的私有空间和一定数量的共享空间。对于没有共享的工作负载,目标是向本地片提供更多命中,同时仍将总未命中率保持在较低水平。对于具有足够共享的工作负载,目标是允许在最后一级的缓存片中进行更多共享。我们提出了混合的最后一级缓存设计选项,并针对许多重要的服务器应用程序和多程序工作负载研究了其命中率/未命中率行为。我们对基于SPEC CINT2000的多程序工作负载以及基于商业服务器基准测试(TPCC,SPECjbb,SAP和TPCE)的多线程工作负载的仿真结果表明,该体系结构特别有利,因为它可以显着提高本地命中率,同时保持总未命中率类似于共享缓存。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号