Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs

Jonathan Bailey; John Kloosterman; Scott Mahlke

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs

【24h】

Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs

机译：暂存（但要缓存）：用于GPU的混合寄存器缓存/暂存器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Graphics processing units (GPUs) are throughput-oriented architectures that implement massive multithreading. Large, power-hungry register files are required in GPUs to support the simultaneous execution of thousands of threads on the hardware. Prior work proposed reducing register access energy by adding a small register cache (RC) to the GPU. The cache stores recently referenced registers and services subsequent accesses to these registers, reducing accesses to the main register file. Later work obtained further energy savings by replacing this cache with a compiler-managed scratchpad. We note that registers are allocated to the cache dynamically and reactively whereas registers are allocated to the scratchpad statically and proactively. Our insight is that these allocation schemes are complimentary because the cache leverages runtime information unavailable to the compiler and the scratchpad leverages compile time information unavailable to the cache. Further, there exist register access patterns that are easily captured by one structure but for which the other structure is ineffective. Instead of implementing either an RC or scratchpad alone, we propose dividing temporary register storage capacity between a cache and a scratchpad in order to capture a broader range of register accesses. Given 12 KB of storage per streaming multiprocessor, our hybrid design reduces register energy to 38.7% of the baseline, compared to 47.9% for a RC and 47.1% for a register scratchpad.

机译：图形处理单元（GPU）是实现大规模多线程的面向吞吐量的体系结构。 GPU需要大型且耗电的寄存器文件，以支持在硬件上同时执行数千个线程。先前的工作提出了通过向GPU添加小型寄存器高速缓存（RC）来减少寄存器访问能量的方法。高速缓存存储最近引用的寄存器，并为对这些寄存器的后续访问提供服务，从而减少了对主寄存器文件的访问。后来的工作通过用编译器管理的暂存器替换此缓存，进一步节省了能源。我们注意到，寄存器是动态和被动地分配给缓存的，而寄存器是静态和主动地分配给暂存器的。我们的见解是，这些分配方案是互补的，因为缓存利用了编译器不可用的运行时信息，而暂存器利用了缓存不可用的编译时间信息。此外，存在一种寄存器访问模式，其容易被一种结构捕获，但是对于另一种结构无效。我们建议不要在缓存和暂存器之间分配临时寄存器存储容量，而不是单独实现RC或暂存器，以捕获更广泛的寄存器访问范围。给定每个流式多处理器12 KB的存储空间，我们的混合设计将寄存器能量降低到基准的38.7％，而RC的为47.9％，寄存器暂存器的为47.1％。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2018年第11期|2779-2789|共11页
作者
Jonathan Bailey; John Kloosterman; Scott Mahlke;
展开▼
作者单位

Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI, USA;

Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI, USA;

Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, MI, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Registers; Graphics processing units; Resource management; Message systems; Kernel; Hardware; Throughput;

机译：寄存器;图形处理单元;资源管理;消息系统;内核;硬件;吞吐量;

相似文献

外文文献
中文文献
专利

1. Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures [J] . Li Bingchao, Wei Jizeng, Kim Nam Sung Microprocessors and microsystems . 2021,第Sepa期

机译：虚拟缓存：高效GPU缓存架构的缓存线借用技术
2. A Performance Model for GPUs with Caches [J] . Dao Thanh Tuan, Kim Jungwon, Seo Sangmin, Parallel and Distributed Systems, IEEE Transactions on . 2015,第7期

机译：具有缓存的GPU的性能模型
3. A SURVEY OF TECHNIQUES FOR MANAGING AND LEVERAGING CACHES IN GPUs [J] . SPARSH MITTAL Journal of Circuits, Systems, and Computers . 2014,第8期

机译：GPU中的缓存管理和杠杆技术研究
4. Data Allocation for Embedded Systems with Hybrid On-Chip Scratchpad and Caches [C] . Wang Guanhua, Ju Lei, Jia Zhiping, 2013 IEEE International Conference on High Performance Computing and Communications amp; 2013 IEEE International Conference on Embedded and Ubiquitous Computing . 2013

机译：具有混合片上暂存器和缓存的嵌入式系统的数据分配
5. Time-predictable fast memories: Caches vs. scratchpad memories. [D] . Liu, Yu. 2011

机译：时间可预测的快速存储器：高速缓存与暂存器。
6. Pooled whole‐genome sequencing of interspecific chestnut (Castanea) hybrids reveals loci associated with differences in caching behavior of fox squirrels (Sciurus niger L.) [O] . Nicholas R. LaBonte, Keith E. Woeste 2018

机译：种间板栗（Castanea）杂种的全基因组混合测序揭示了与狐狸松鼠（Sciurus niger L.）的缓存行为差异相关的基因座
7. Investigating average versus worst-case timing behavior of data caches and data scratchpads [O] . Jack Whitham, Neil Audsley 2015

机译：调查数据缓存和数据暂存器的平均与最差情况的时序行为

Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅