Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Candel Francisco; Valero Alejandro; Petit Salvador; Sahuquillo Julio

首页> 外文期刊>IEEE Transactions on Computers >Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

【24h】

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

机译：高效的缓存访问管理，以提高GPGPU内存子系统的性能

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases each GPU generation. This paper shows that counter-intuitively, enlarging the LLC brings marginal performance gains in most applications. In other words, increasing the LLC size does not scale neither in performance nor energy consumption. We examine how LLC misses are managed in typical GPUs, and we find that in most cases the way LLC misses are managed are precisely the main performance limiter. This paper proposes a novel approach that addresses this shortcoming by leveraging a tiny additional Fetch and Replacement Cache-like structure (FRC) that stores control and coherence information of the incoming blocks until they are fetched from main memory. Then, the fetched blocks are swapped with the victim blocks (i.e., selected to be replaced) in the LLC, and the eviction of such victim blocks is performed from the FRC. This approach improves performance due to three main reasons: i) the lifetime of blocks being replaced is enlarged, ii) the main memory path is unclogged on long bursts of LLC misses, and iii) the average LLC miss latency is reduced. The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches. Moreover, this is achieved with reduced energy consumption and with much less area requirements. Experimental results show that the proposed FRC cache scales in performance with the number of GPU compute units and the LLC size, since, depending on the FRC size, performance improves ranging from 30 to 67 percent for a modern baseline GPU card, and from 32 to 118 percent for a larger GPU. In addition, energy consumption is reduced on average from 49 to 57 percent for the larger GPU. These benefits come with a small area increase (by 7.3 percent) over the LLC baseline.

机译：为了支持GPGPU应用程序生成的大量内存访问，GPU内存层次结构变得越来越复杂，并且最后一级缓存（LLC）的大小显着增加了每一代GPU。本文表明，与直觉相反，扩大LLC可以在大多数应用中带来边际性能提升。换句话说，增加LLC的大小既不会在性能上也不在能源消耗上成比例。我们研究了如何在典型的GPU中管理LLC缺失，并且发现在大多数情况下LLC缺失的管理方式正是主要的性能限制因素。本文提出了一种新颖的方法，通过利用微小的额外的类似“访存和替换缓存”结构（FRC）来解决此缺点，该结构存储传入块的控制和一致性信息，直到从主存储器中取出它们为止。然后，将所获取的块与LLC中的受害块（即，被选择替换）交换，并且从FRC执行对这些受害块的逐出。由于以下三个主要原因，此方法提高了性能：i）替换块的寿命得以延长； ii）长时间丢失LLC未命中时，主存储路径被清除，并且iii）降低了平均LLC未命中延迟。与更大的传统高速缓存相比，该提议提高了LLC命中率，内存级并行度并减少了未命中延迟。而且，这可以通过减少能耗和更少的面积要求来实现。实验结果表明，建议的FRC缓存会根据GPU计算单元的数量和LLC的大小来扩展性能，因为取决于FRC的大小，对于现代基准GPU卡，性能可提高30％至67％，而性能可提高32％至32％。较大的GPU的118％。此外，大型GPU的能耗平均从49％降低到57％。这些好处是与LLC基准线相比面积小幅增加（7.3％）。

著录项

来源
《IEEE Transactions on Computers》 |2019年第10期|1442-1454|共13页
作者
Candel Francisco; Valero Alejandro; Petit Salvador; Sahuquillo Julio;
展开▼
作者单位

Univ Politecn Valencia, Dept Comp Engn, E-46022 Valencia, Spain;

Univ Zaragoza, Inst Univ Ingn Aragon, Dept Informat & Ingn Sistemas, E-50009 Zaragoza, Spain;

Univ Politecn Valencia, Dept Comp Engn, E-46022 Valencia, Spain;

Univ Politecn Valencia, Dept Comp Engn, E-46022 Valencia, Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU; memory hierarchy; miss management;

机译：GPU;记忆层次结构;管理层;

相似文献

外文文献
中文文献
专利

1. Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance [J] . Candel Francisco, Valero Alejandro, Petit Salvador, IEEE Transactions on Computers . 2019,第10期

机译：高效管理缓存访问以提升GPGPU内存子系统性能
2. Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory [J] . Wang Guan, Zang Chuanqi, Ju Lei, ACM Transactions on Embedded Computing Systems . 2018,第4期

机译：具有混合主存储器的GPGPU的共享最后一级缓存管理和内存调度
3. Orchestrating Cache Management and Memory Scheduling for GPGPU Applications [J] . Mu S., Deng Y., Chen Y., Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2014,第8期

机译：为GPGPU应用程序协调缓存管理和内存调度
4. An efficient memory block selection strategy to improve the performance of cache memory subsystem [C] . Asaduzzaman Abu 14th International Conference on Computer and Information Technology . 2011

机译：一种有效的内存块选择策略，可提高高速缓存内存子系统的性能
5. Efficient caching algorithms for memory management in computer systems [D] . Jiang, Song 2004

机译：用于计算机系统中内存管理的高效缓存算法
6. Neuropsychological Performance in Advanced Age- Influences of Demographic Factors and Apolipoprotein E: Findings from the Cache County Memory Study [O] . Katheen A. Welsh-Bohmer, Truls Østbye, Linda Sanders, -1

机译：晚期年龄的神经心理学表现-人口统计学因素和载脂蛋白E的影响：Cache县记忆研究的发现
7. Efficient L2 Cache Management to Boost GPGPU Performance [O] . Francisco Candel Margaix -1

机译：高效的L2缓存管理促进GPGPU性能
8. Teuchos C++ Memory Management Classes, Idioms, and Related Topics, the Complete Reference: a Comprehensive Strategy for Safe and Efficient Memory Management in C++ for High Performance Computing [R] . Bartlett, R. A. 2010

机译：Teuchos C ++内存管理类，成语和相关主题，完整参考：用于高性能计算的C ++中安全高效内存管理的综合策略

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅