Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

Candel Francisco; Valero Alejandro; Petit Salvador; Sahuquillo Julio

首页> 外文期刊>IEEE Transactions on Computers >Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

【24h】

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

机译：高效管理缓存访问以提升GPGPU内存子系统性能

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To support the massive amount of memory accesses that GPGPU applications generate, GPU memory hierarchies are becoming more and more complex, and the Last Level Cache (LLC) size considerably increases each GPU generation. This paper shows that counter-intuitively, enlarging the LLC brings marginal performance gains in most applications. In other words, increasing the LLC size does not scale neither in performance nor energy consumption. We examine how LLC misses are managed in typical GPUs, and we find that in most cases the way LLC misses are managed are precisely the main performance limiter. This paper proposes a novel approach that addresses this shortcoming by leveraging a tiny additional Fetch and Replacement Cache-like structure (FRC) that stores control and coherence information of the incoming blocks until they are fetched from main memory. Then, the fetched blocks are swapped with the victim blocks (i.e., selected to be replaced) in the LLC, and the eviction of such victim blocks is performed from the FRC. This approach improves performance due to three main reasons: i) the lifetime of blocks being replaced is enlarged, ii) the main memory path is unclogged on long bursts of LLC misses, and iii) the average LLC miss latency is reduced. The proposal improves the LLC hit ratio, memory-level parallelism, and reduces the miss latency compared to much larger conventional caches. Moreover, this is achieved with reduced energy consumption and with much less area requirements. Experimental results show that the proposed FRC cache scales in performance with the number of GPU compute units and the LLC size, since, depending on the FRC size, performance improves ranging from 30 to 67 percent for a modern baseline GPU card, and from 32 to 118 percent for a larger GPU. In addition, energy consumption is reduced on average from 49 to 57 percent for the larger GPU. These benefits come with a small area increase (by 7.3 percent) over the LLC baseline.

机译：为了支持大量的内存访问，GPGPU应用程序生成，GPU内存层次结构变得越来越复杂，并且最后一个级别的高速缓存（LLC）大小显着增加了每个GPU的生成。本文表明，反直观地扩大LLC在大多数应用中引起边际性能增益。换句话说，增加LLC尺寸既不在性能下都不扩展，也不会使能量消耗量。我们研究LLC未命中在典型的GPU中如何管理，我们发现在大多数情况下，管理LLC未命中的方式是主要的性能限制器。本文提出了一种新的方法，可以通过利用微小的额外获取和更换高速缓存的结构（FRC）来解决这种缺点，该结构（FRC）存储进入块的控制和一致性信息，直到它们从主存储器中获取。然后，将获取的块与LLC中的受害块（即，选择被选中）交换，并且从FRC执行这种受害块的驱逐。这种方法由于三个主要原因而提高了性能：i）被替换的块的寿命被放大，ii）主存储器路径未跳过LLC未命中的长突发，并且III）降低了平均LLC小姐延迟。该提案提高了LLC命中比率，内存级并行性，并减少了与更大的传统高速缓存相比的延迟。此外，这是通过降低的能量消耗和更少的区域要求实现。实验结果表明，采用GPU计算单元数量和LLC尺寸的性能中所提出的FRC缓存尺度，因此，根据FRC尺寸，性能提高了现代基线GPU卡的30％至67％，以及32个较大的GPU 118％。此外，较大的GPU的能量消耗平均降低到57％至57％。这些效益在LLC基线上增加了一小部分（按7.3％）。

著录项

来源
《IEEE Transactions on Computers》 |2019年第10期|1442-1454|共13页
作者
Candel Francisco; Valero Alejandro; Petit Salvador; Sahuquillo Julio;
展开▼
作者单位

Univ Politecn Valencia Dept Comp Engn E-46022 Valencia Spain;

Univ Zaragoza Inst Univ Ingn Aragon Dept Informat & Ingn Sistemas E-50009 Zaragoza Spain;

Univ Politecn Valencia Dept Comp Engn E-46022 Valencia Spain;

Univ Politecn Valencia Dept Comp Engn E-46022 Valencia Spain;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
GPU; memory hierarchy; miss management;

机译：GPU;记忆层次结构;管理层;

相似文献

外文文献
中文文献
专利

1. Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance [J] . Candel Francisco, Valero Alejandro, Petit Salvador, IEEE Transactions on Computers . 2019,第10期

机译：高效的缓存访问管理，以提高GPGPU内存子系统的性能
2. Shared Last-Level Cache Management and Memory Scheduling for GPGPUs with Hybrid Main Memory [J] . Wang Guan, Zang Chuanqi, Ju Lei, ACM Transactions on Embedded Computing Systems . 2018,第4期

机译：具有混合主存储器的GPGPU的共享最后一级缓存管理和内存调度
3. Orchestrating Cache Management and Memory Scheduling for GPGPU Applications [J] . Mu S., Deng Y., Chen Y., Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2014,第8期

机译：为GPGPU应用程序协调缓存管理和内存调度
4. An efficient memory block selection strategy to improve the performance of cache memory subsystem [C] . Asaduzzaman Abu 14th International Conference on Computer and Information Technology . 2011

机译：一种有效的内存块选择策略，可提高高速缓存内存子系统的性能
5. Efficient caching algorithms for memory management in computer systems [D] . Jiang, Song 2004

机译：用于计算机系统中内存管理的高效缓存算法
6. Neuropsychological Performance in Advanced Age- Influences of Demographic Factors and Apolipoprotein E: Findings from the Cache County Memory Study [O] . Katheen A. Welsh-Bohmer, Truls Østbye, Linda Sanders, -1

机译：晚期年龄的神经心理学表现-人口统计学因素和载脂蛋白E的影响：Cache县记忆研究的发现
7. Efficient L2 Cache Management to Boost GPGPU Performance [O] . Francisco Candel Margaix -1

机译：高效的L2缓存管理促进GPGPU性能
8. Teuchos C++ Memory Management Classes, Idioms, and Related Topics, the Complete Reference: a Comprehensive Strategy for Safe and Efficient Memory Management in C++ for High Performance Computing [R] . Bartlett, R. A. 2010

机译：Teuchos C ++内存管理类，成语和相关主题，完整参考：用于高性能计算的C ++中安全高效内存管理的综合策略

Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅