首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs
【24h】

Scratch That (But Cache This): A Hybrid Register Cache/Scratchpad for GPUs

机译:暂存(但要缓存):用于GPU的混合寄存器缓存/暂存器

获取原文
获取原文并翻译 | 示例

摘要

Graphics processing units (GPUs) are throughput-oriented architectures that implement massive multithreading. Large, power-hungry register files are required in GPUs to support the simultaneous execution of thousands of threads on the hardware. Prior work proposed reducing register access energy by adding a small register cache (RC) to the GPU. The cache stores recently referenced registers and services subsequent accesses to these registers, reducing accesses to the main register file. Later work obtained further energy savings by replacing this cache with a compiler-managed scratchpad. We note that registers are allocated to the cache dynamically and reactively whereas registers are allocated to the scratchpad statically and proactively. Our insight is that these allocation schemes are complimentary because the cache leverages runtime information unavailable to the compiler and the scratchpad leverages compile time information unavailable to the cache. Further, there exist register access patterns that are easily captured by one structure but for which the other structure is ineffective. Instead of implementing either an RC or scratchpad alone, we propose dividing temporary register storage capacity between a cache and a scratchpad in order to capture a broader range of register accesses. Given 12 KB of storage per streaming multiprocessor, our hybrid design reduces register energy to 38.7% of the baseline, compared to 47.9% for a RC and 47.1% for a register scratchpad.
机译:图形处理单元(GPU)是实现大规模多线程的面向吞吐量的体系结构。 GPU需要大型且耗电的寄存器文件,以支持在硬件上同时执行数千个线程。先前的工作提出了通过向GPU添加小型寄存器高速缓存(RC)来减少寄存器访问能量的方法。高速缓存存储最近引用的寄存器,并为对这些寄存器的后续访问提供服务,从而减少了对主寄存器文件的访问。后来的工作通过用编译器管理的暂存器替换此缓存,进一步节省了能源。我们注意到,寄存器是动态和被动地分配给缓存的,而寄存器是静态和主动地分配给暂存器的。我们的见解是,这些分配方案是互补的,因为缓存利用了编译器不可用的运行时信息,而暂存器利用了缓存不可用的编译时间信息。此外,存在一种寄存器访问模式,其容易被一种结构捕获,但是对于另一种结构无效。我们建议不要在缓存和暂存器之间分配临时寄存器存储容量,而不是单独实现RC或暂存器,以捕获更广泛的寄存器访问范围。给定每个流式多处理器12 KB的存储空间,我们的混合设计将寄存器能量降低到基准的38.7%,而RC的为47.9%,寄存器暂存器的为47.1%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号