首页> 外文期刊>IEEE Transactions on Computers >Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance
【24h】

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

机译:高效的缓存访问管理,以提高GPGPU内存子系统的性能

获取原文
获取原文并翻译 | 示例

摘要

To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases each GPU generation. This paper shows that counter-intuitively, enlarging the LLC brings marginal performance gains in most applications. In other words, increasing the LLC size does not scale neither in performance nor energy consumption. We examine how LLC misses are managed in typical GPUs, and we find that in most cases the way LLC misses are managed are precisely the main performance limiter. This paper proposes a novel approach that addresses this shortcoming by leveraging a tiny additional Fetch and Replacement Cache-like structure (FRC) that stores control and coherence information of the incoming blocks until they are fetched from main memory. Then, the fetched blocks are swapped with the victim blocks (i.e., selected to be replaced) in the LLC, and the eviction of such victim blocks is performed from the FRC. This approach improves performance due to three main reasons: i) the lifetime of blocks being replaced is enlarged, ii) the main memory path is unclogged on long bursts of LLC misses, and iii) the average LLC miss latency is reduced. The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches. Moreover, this is achieved with reduced energy consumption and with much less area requirements. Experimental results show that the proposed FRC cache scales in performance with the number of GPU compute units and the LLC size, since, depending on the FRC size, performance improves ranging from 30 to 67 percent for a modern baseline GPU card, and from 32 to 118 percent for a larger GPU. In addition, energy consumption is reduced on average from 49 to 57 percent for the larger GPU. These benefits come with a small area increase (by 7.3 percent) over the LLC baseline.
机译:为了支持GPGPU应用程序生成的大量内存访问,GPU内存层次结构变得越来越复杂,并且最后一级缓存(LLC)的大小显着增加了每一代GPU。本文表明,与直觉相反,扩大LLC可以在大多数应用中带来边际性能提升。换句话说,增加LLC的大小既不会在性能上也不在能源消耗上成比例。我们研究了如何在典型的GPU中管理LLC缺失,并且发现在大多数情况下LLC缺失的管理方式正是主要的性能限制因素。本文提出了一种新颖的方法,通过利用微小的额外的类似“访存和替换缓存”结构(FRC)来解决此缺点,该结构存储传入块的控制和一致性信息,直到从主存储器中取出它们为止。然后,将所获取的块与LLC中的受害块(即,被选择替换)交换,并且从FRC执行对这些受害块的逐出。由于以下三个主要原因,此方法提高了性能:i)替换块的寿命得以延长; ii)长时间丢失LLC未命中时,主存储路径被清除,并且iii)降低了平均LLC未命中延迟。与更大的传统高速缓存相比,该提议提高了LLC命中率,内存级并行度并减少了未命中延迟。而且,这可以通过减少能耗和更少的面积要求来实现。实验结果表明,建议的FRC缓存会根据GPU计算单元的数量和LLC的大小来扩展性能,因为取决于FRC的大小,对于现代基准GPU卡,性能可提高30%至67%,而性能可提高32%至32%。较大的GPU的118%。此外,大型GPU的能耗平均从49%降低到57%。这些好处是与LLC基准线相比面积小幅增加(7.3%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号