首页> 外文会议>Proceedings of the 3rd workshop on Memory performance issues >Scalable cache memory design for large-scale SMT architectures

Scalable cache memory design for large-scale SMT architectures


获取原文并翻译 | 示例


The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for band-width. The size of the L1 data cache did not scale over the past decade. Instead, larger unified L2 and L3 caches were introduced. This cache hierarchy has a high overhead due to the principle of containment. It also has a complex design to maintain cache coherence across all levels. Furthermore, this cache hierarchy is not suitable for future large-scale SMT processors, which will demand high bandwidth instruction and data caches with a large number of ports.This paper suggests the elimination of the cache hierarchy and replacing it with one-level caches for instruction and data. Multiple instruction caches can be used in parallel to scale the instruction fetch bandwidth and the overall cache capacity. A one-level data cache can be split into a number of block-interleaved cache banks to serve multiple memory requests in parallel. An interconnect is used to connect the data cache portsto the different cache banks, thus increasing the data cache access time. This paper shows that large-scale SMTs can tolerate long data cache hit times. It also shows that small line buffers can enhance the performance and reduce the required number of ports to the banked data cache memory.
机译:现有SMT和超标量处理器中的缓存层次结构设计针对延迟进行了优化,但并未针对带宽进行优化。 L1数据缓存的大小在过去十年中没有扩大。相反,引入了更大的统一L2和L3缓存。由于包含原理,此缓存层次结构具有较高的开销。它还具有复杂的设计,可以维持所有级别的缓存一致性。此外,此高速缓存层次结构不适用于未来的大型SMT处理器,后者将需要具有大量端口的高带宽指令和数据高速缓存。本文建议消除高速缓存层次结构,并用单级高速缓存代替指令和数据。可以并行使用多个指令高速缓存来缩放指令获取带宽和整体高速缓存容量。一级数据高速缓存可以分为多个块交错的高速缓存存储区,以并行服务多个内存请求。互连用于将数据缓存端口连接到不同的缓存库,从而增加了数据缓存访问时间。本文表明,大型SMT可以忍受较长的数据高速缓存命中时间。它还表明,较小的行缓冲区可以提高性能并减少到存储的数据高速缓存存储器所需的端口数。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号