首页> 外文学位 >Increasing cache efficiency by eliminating noise and using restrictive compression techniques.
【24h】

Increasing cache efficiency by eliminating noise and using restrictive compression techniques.

机译:通过消除噪声和使用限制性压缩技术来提高缓存效率。

获取原文
获取原文并翻译 | 示例

摘要

With the increasing performance gap between the processor and the memory, the importance of caches is increasing for high performance processors. However, with reducing feature sizes and increasing clock speeds, cache access latencies are increasing, limiting the size of level 1 cache that can be integrated on the chip. Limited sized caches can significantly impact cache miss rate, thus reducing performance.;We investigate restrictive compression techniques for level 1 data cache, to avoid an increase in the cache access latency. The basic technique --- All words Narrow (AWN --- compresses a cache block only if all the words in the cache block are of narrow size. We extend the AWN technique to store a few upper half-words (Additional Half-word Space - AHS) in a cache block to accommodate a small number of normal sized words in the cache block. Further we make the AHS technique adaptive, where the additional half-words space is adaptively allocated to the various cache blocks. We also propose techniques to reduce the increase in the tag space that is inevitable with compression techniques. The above techniques increase the L1 data cache capacity (in terms of the average number of valid cache blocks per cycle) by about 50%, compared to the conventional cache, with no or minimal impact on the cache access time. In addition, the techniques have the potential of reducing the average L1 data cache miss rate by about 23%.;We know that caches are very inefficiently utilized because not all the excess data brought into the cache, to exploit spatial locality, is utilized. We define cache utilization as the percentage of data brought into the cache that is actually used. Our experiments showed that Level 1 data cache has a utilization of only about 57%. Increasing the effectiveness of the cache (by increasing its utilization) can have significant benefits in terms of reducing the cache energy consumption, reducing the bandwidth requirement, and making more space available for the useful data.;We focus on prediction mechanisms to predict the unused data in a cache block (cache noise). The prediction mechanisms consider the words usage history of cache blocks for predicting the useful data, so that only the useful data is fetched into the cache on a cache miss. In particular, we investigate three flavors of prediction mechanisms: (i) phase context prediction; which considers the words usage history of the current phase of the program, (ii) memory context prediction; which considers the words usage history of contiguous memory locations, and (iii) code context prediction; which considers the words usage history of a contiguous set of instructions. We found that the code context predictor has the best predictability of about 95% with a simple last word usage predictor.;When applying cache noise prediction to L1 data cache, we observed about 37% improvement in cache utilization, and about 23% and 28% reduction in cache energy consumption and bandwidth requirement, respectively. Cache noise mispredictions increased the miss rate by 0.1% and had almost no impact on the Instructions Per Cycle (IPC) count. When compared to a sub-blocked cache, fetching the to-be-referenced data (given by cache noise predictor) resulted in 97% and 44% improvement in miss rate and cache utilization, respectively. However, the sub-blocked cache had a bandwidth requirement about 35% of the cache noise prediction based approach. We also observed that cache noise prediction significantly improves the utilization and reduces the bandwidth requirement for prefetching.;We use this highly accurate prediction mechanism to fetch only the to-be-referenced data into the L1 data cache on a cache miss. We then utilize the cache space, thus made available, to store words from multiple cache blocks in a single physical cache block space in the cache, thus increasing the useful words in the cache. We also propose methods to combine this technique with a value-based approach to further increase the cache capacity. Our experiments show that, with our techniques we achieve about 57% of the L1 data cache miss rate reduction and about 60% of the cache capacity increase observed when using double sized cache, with only about 25% cache space overhead.;Finally we show the effect of our techniques on Simultaneous Multi-Threaded Processors (SMT), where cache blocks from multiple threads vie for the precious cache space. Our techniques achieve about 46% reduction in miss rate as achieved by double sized cache. We also show the effect of our techniques on context switching, where the cache for a particular context is polluted by the other context. We observed that our techniques increase the utilization of L1 data cache from 30% in context switching base case to 80% and it increases the cache capacity by about 55% as compared to the base case context switching.
机译:随着处理器与内存之间性能差距的不断扩大,对于高性能处理器而言,缓存的重要性日益提高。但是,随着功能部件尺寸的减小和时钟速度的提高,高速缓存访​​问延迟也在增加,从而限制了可集成在芯片上的1级高速缓存的大小。有限大小的缓存可能会严重影响缓存未命中率,从而降低性能。我们研究了1级数据缓存的限制性压缩技术,以避免增加缓存访问延迟。基本技术--- All Words Narrow(AWN ---仅在高速缓存块中的所有单词都具有窄大小时才压缩高速缓存块。我们扩展AWN技术以存储一些高半个字(附加的半字缓存块中的空间(AHS)以容纳少量正常大小的字,此外,我们使AHS技术具有自适应性,其中额外的半字空间被自适应地分配给各种缓存块。与传统的高速缓存相比,上述技术将L1数据高速缓存容量(就每个周期的有效高速缓存块的平均数量而言)提高了约50%,从而减少了压缩技术不可避免的标记空间。对缓存访问时间没有影响或影响很小,此外,该技术还具有将平均L1数据缓存未命中率降低约23%的潜力。我们知道缓存使用效率非常低,因为并非所有多余的数据都被带入利用缓存来开发空间局部性。我们将缓存利用率定义为带入缓存中实际使用的数据的百分比。我们的实验表明,一级数据缓存的利用率仅为57%。 (通过提高缓存的利用率)提高缓存的有效性可以在减少缓存能耗,减少带宽需求以及为有用数据提供更多空间方面具有显着的好处。;我们专注于预测机制以预测未使用的数据缓存块中的数据(缓存噪声)。预测机制考虑用于预测有用数据的高速缓存块的字使用历史,以便在高速缓存未命中时仅将有用数据提取到高速缓存中。特别是,我们研究了三种预测机制:(i)阶段上下文预测;其中考虑了程序当前阶段的单词使用历史,(ii)内存上下文预测;其中考虑了连续存储位置的单词使用历史,以及(iii)代码上下文预测;它考虑了一组连续指令的单词使用历史。我们发现代码上下文预测器的最佳可预测性约为95%,而最后一个单词使用率预测器则简单。;将高速缓存噪声预测应用于L1数据高速缓存时,我们观察到高速缓存利用率提高了约37%,而高速缓存利用率则分别提高了约23%和28分别降低了缓存能耗和带宽需求的%。高速缓存噪声的错误预测使未命中率提高了0.1%,并且几乎对每个周期的指令(IPC)计数没有影响。与子块式缓存相比,获取要引用的数据(由缓存噪声预测器提供)分别使未命中率和缓存利用率提高了97%和44%。但是,子块高速缓存的带宽需求约为基于高速缓存噪声预测的方法的35%。我们还观察到高速缓存噪声预测显着提高了利用率,并降低了预取的带宽要求。我们使用这种高度精确的预测机制,仅在高速缓存未命中时将要引用的数据提取到L1数据高速缓存中。然后,我们利用因此可用的缓存空间将来自多个缓存块的字存储在缓存中的单个物理缓存块空间中,从而增加了缓存中的有用字。我们还提出了将这种技术与基于值的方法相结合的方法,以进一步增加缓存容量。我们的实验表明,使用我们的技术,在使用双倍大小的高速缓存时,L1数据高速缓存未命中率降低了约57%,高速缓存容量增加了约60%,而高速缓存空间开销只有约25%。我们的技术对同时多线程处理器(SMT)的影响,其中来自多个线程的缓存块争夺宝贵的缓存空间。我们的技术通过双倍大小的缓存实现了约46%的未命中率降低。我们还展示了我们的技术对上下文切换的影响,其中特定上下文的缓存被其他上下文污染。我们观察到,与基本情况上下文切换相比,我们的技术将L1数据缓存的利用率从上下文切换基础情况下的30%提高到80%,并且将缓存容量提高了约55%。

著录项

  • 作者

    Pujara, Prateek.;

  • 作者单位

    State University of New York at Binghamton.;

  • 授予单位 State University of New York at Binghamton.;
  • 学科 Engineering Computer.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 145 p.
  • 总页数 145
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 水产、渔业;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号