首页> 外文会议>ACM/IEEE International Symposium on Computer Architecture >STAG: Spintronic-Tape Architecture for GPGPU Cache Hierarchies
【24h】

STAG: Spintronic-Tape Architecture for GPGPU Cache Hierarchies

机译:Stag:GPGPU缓存层次结构的旋转磁带架构

获取原文

摘要

General-purpose Graphics Processing Units (GPGPUs) are widely used for executing massively parallel workloads from various application domains. Feeding data to the hundreds to thousands of cores that current GPGPUs integrate places great demands on the memory hierarchy, fueling an ever-increasing demand for on-chip memory. In this work, we propose STAG, a high density, energy-efficient GPGPU cache hierarchy design using a new spin-tronic memory technology called Domain Wall Memory (DWM). DWMs inherently offer unprecedented benefits in density by storing multiple bits in the domains of a ferromagnetic nanowire, which logically resembles a bit-serial tape. However, this structure also leads to a unique challenge that the bits must be sequentially accessed by performing "shift" operations, resulting in variable and potentially higher access latencies. To address this challenge, STAG utilizes a number of architectural techniques : (i) a hybrid cache organization that employs different DWM bit-cells to realize the different memory arrays within the GPGPU cache hierarchy, (ii) a clustered, bit-interleaved organization, in which the bits in a cache block are spread across a cluster of DWM tapes, allowing parallel access, (iii) tape head management policies that predictively configure DWM arrays to reduce the expected number of shift operations for subsequent accesses, and (iv) a shift aware promotion buffer (SaPB), in which accesses to the DWM cache are predicted based on intra-warp locality, and locations that would incur a large shift penalty are promoted to a smaller buffer. Over a wide range of benchmarks from the Rodinia, IS-PASS and Parboil suites, STAG achieves significant benefits in performance (12.1% over SRAM and 5.8% over STT-MRAM) and energy (3.3X over SRAM and 2.6X over STT-MRAM).
机译:通用图形处理单元(GPGPUs)被广泛地用于各种应用领域中执行大规模并行工作负载。饲喂数据到数百至数千的核心,目前GPGPUs整合地方上的内存层次巨大需求,推动了对片上存储器的需求不断增长。在这项工作中,我们提出了STAG,高密度,采用新的旋转-TRONIC内存技术称为域墙存储器(DWM)节能GPGPU高速缓存层次结构的设计。 DWMS固有通过在纳米线铁磁,它在逻辑上类似于一个位串行带的域存储多位提供在密度前所未有的好处。然而,这种结构也导致了独特的挑战,该位必须通过执行“转移”操作,导致可变的并且潜在的更高的访问延迟被顺序地访问。为了应对这一挑战,STAG利用多种建筑技术:(ⅰ)的混合的高速缓存组织,其采用不同的DWM位单元来实现GPGPU缓存层次结构内的不同存储器阵列,(ⅱ)一个群集,比特交织组织,其中在高速缓存块中的位是跨越DWM磁带的集群传播,从而允许并行访问,(ⅲ)磁头管理政策,预测配置DWM阵列来减少用于随后的访问移位操作的预期数量,和(iv)一移位意识到促进缓冲液(SaPB),其中访问所述高速缓存DWM基于帧内经局部性预测,这将招致大的移位惩罚的位置被提升到一个较小的缓冲区。在从罗迪尼亚广泛的基准,IS-PASS和胹套房,STAG实现了性能显著的好处和能量(3.3X超过SRAM和2.6X了STT-MRAM(超过STT-MRAM比SRAM 12.1%和5.8%) )。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号