首页> 外文会议>Annual international symposium on Computer Architecture;International symposium on Computer Architecture >Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
【24h】

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

机译:通过添加小的全关联高速缓存和预取缓冲区来提高直接映射的高速缓存性能

获取原文

摘要

Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper presents hardware techniques to improve the performance of caches.

Miss caching places a small fully-associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches.

Victim caching is an improvement to miss caching that loads the small fully-associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching.

Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. Stream buffers are more effective than previously investigated prefetch techniques at using the next slower level in the memory hierarchy when it is pipelined. An extension to the basic stream buffer, called multi-way stream buffers, is introduced. Multi-way stream buffers are useful for prefetching along multiple intertwined data reference streams.

Together, victim caches and stream buffers reduce the miss rate of the first level in the cache hierarchy by a factor of two to three on a set of six large benchmarks.

机译:

计算机技术的预测预测在相对不久的将来,处理器的峰值性能将达到1,000 MIPS。如果层次结构设计基于常规缓存技术,则这些处理器可能会轻易失去其在内存层次结构中一半或更多的性能。本文提出了提高缓存性能的硬件技术。

小姐缓存在缓存及其重新填充路径之间放置了一个小型的全关联缓存。与未命中高速缓存中没有命中的多个周期未命中相比,在未命中高速缓存中命中的高速缓存中的未命中只有一个周期的未命中代价。事实证明,由2至5个条目组成的小型未命中高速缓存对于消除一级直接映射高速缓存中的映射冲突未命中非常有效。

受害者缓存是对未命中缓存的一种改进,它将未命中的受害者而不是请求的行加载到小型的全关联缓存中。小型的1到5个条目的受害者缓存比删除未命中缓存更有效地消除冲突未命中。

流缓冲区预取从缓存未命中地址开始的缓存行。预取的数据放置在缓冲区中,而不是在缓存中。流缓冲区对于消除容量和强制性高速缓存未命中以及某些指令高速缓存冲突未命中很有用。当使用流水线处理时,在使用内存层次结构中的下一个较慢级别时,流缓冲区比以前研究的预取技术更有效。介绍了对基本流缓冲区的扩展,称为多路流缓冲区。多路流缓冲区对于沿着多个相互交织的数据参考流进行预取非常有用。

在一组六个大型基准测试中,受害缓存和流缓冲区一起将缓存层次结构中第一级的未命中率降低了2到3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号