FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput

机译：FINEREG：用于增强GPU吞吐量的细粒度注册文件管理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics processing units (GPUs) include a large amount of hardware resources for parallel thread executions. However, the resources are not fully utilized during runtime, and observed throughput often falls far below the peak performance. A major cause is that GPUs cannot deploy enough number of warps at runtime. The limited size of register file constrains the number of cooperative thread arrays (CTAs) as one CTA takes up a few tens of kilobytes of registers. We observe that the actual working set size of a CTA is much smaller in general, and therefore there is room for additional CTAs to run. In this paper, we propose a novel GPU architecture called FineReg that improves overall throughput by increasing the number of concurrent CTAs. In particular, FineReg splits the monolithic register file into two regions, one for active CTAs and another for pending CTAs. Using FineReg, the GPU begins normal executions by allocating all registers required by active CTAs. If all warps of a CTA become stalled, FineReg moves the live registers (i.e., working set) of CTA to the pending-CTA region and launches an additional CTA by assigning registers to the newly activated CTA. If the registers of either active or pending-CTA region are used up, FineReg stops introducing additional CTAs and simply performs context switching between active and pending CTAs. Thus, FineReg increases the number of concurrent CTAs by reducing the effective size of per-CTA registers. Experiment results show that FineReg achieves 32.8% of performance improvement over a conventional GPU architecture.

机译：图形处理单元（GPU）包括用于并行线程执行的大量硬件资源。但是，在运行时未充分利用资源，并且观察到的吞吐量通常远远低于峰值性能。主要原因是GPU无法在运行时部署足够数量的扭曲。寄存器文件的有限尺寸约束协同线程阵列（CTA）的数量，因为一个CTA占用几千千字节的寄存器。我们观察到CTA的实际工作集大小一般要小得多，因此有额外的CTA的空间运行。在本文中，我们提出了一种名为FinEERG的新型GPU架构，通过增加并发CTA的数量来提高整体吞吐量。特别是，FINEREG将单片寄存器文件分成两个区域，一个用于活动CTA，另一个用于待处理CTA。使用FINEREG，GPU通过分配活动CTA所需的所有寄存器开始正常的执行。如果CTA的所有经过衰落被停滞，则FinEERG将CTA的直播寄存器（即，工作集）移动到待处理的CTA区域，并通过将寄存器分配给新激活的CTA来启动附加的CTA。如果要使用Active或Pending-CTA区域的寄存器，则FinEERG停止介绍额外的CTA，并只需在主动和未决的CTA之间执行上下文切换。因此，FINEEREG通过减少每CTA寄存器的有效尺寸来增加并发CTA的数量。实验结果表明，FINEREG通过传统的GPU架构实现了32.8％的性能改进。

著录项

来源
《International Symposium on Microarchitecture》|2018年|xxiv 493 p. :|共13页
会议地点
作者
Yunho Oh; Myung Kuk Yoon; William J. Song; Won Woo Ro;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
Registers; Graphics processing units; Instruction sets; Throughput; Hardware; System-on-chip; Context;

机译：寄存器;图形处理单元;指令集;吞吐量;硬件;片上系统;上下文;

相似文献

外文文献
中文文献
专利

1. FRF: Toward Warp-Scheduler Friendly STT-RAM/SRAM Fine-Grained Hybrid GPGPU Register File Design [J] . Deng Quan, Zhang Youtao, Zhao Zhenyu, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第10期

机译：FRF：朝着经线调度器友好的STT-RAM / SRAM精细颗粒混合GPGPU注册文件设计
2. LTRF: Enabling High-Capacity Register Files for GPUs via Hardware/Software Cooperative Register Prefetching [J] . Mohammad Sadrosadati, Amirhossein Mirhosseini, Seyed Borna Ehsani, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第2期

机译：LTRF：通过硬件/软件合作寄存器预取可实现GPU的高容量寄存器文件
3. Architecture and Compiler Support for GPUs Using Energy-Efficient Affine Register Files [J] . Wang Shao-Chung, Kan Li-Chen, Lee Chao-Lin, ACM Transactions on Design Automation of Electronic Systems . 2018,第2期

机译：使用节能仿射寄存器文件对GPU的体系结构和编译器支持
4. FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput [C] . Yunho Oh, Myung Kuk Yoon, William J. Song, Annual IEEE/ACM International Symposium on Microarchitecture . 2018

机译：FineReg：精细的寄存器文件管理，用于增强GPU吞吐量
5. A Statistical Ftechin Approach for Effective Management of Physical Register File in Simulatenous Multi Threading Processors [D] . Ramanathan, Madhava Krishnan. 2017

机译：一种统计技术方法，用于在同时多线程处理器中有效管理物理寄存器文件
6. Optimizing Instruction Scheduling and Register Allocation for Register-File-Connected Clustered VLIW Architectures [O] . Haijing Tang, Xu Yang, Siye Wang, 2013

机译：连接寄存器文件的集群式VLIW架构的优化指令调度和寄存器分配
7. Hi-End: Hierarchical, Endurance-Aware STT-MRAM-Based Register File for Energy-Efficient GPUs [O] . Won Jeon, Jun Hyun Park, Yoonsoo Kim, 2020

机译：Hi-End：基于分层的，耐用的STT-MRAM的寄存器文件，用于节能GPU

FineReg: Fine-Grained Register File Management for Augmenting GPU Throughput

摘要

著录项

相似文献

相关主题

期刊订阅