首页> 外文会议>ACM/IEEE Annual International Symposium on Computer Architecture >Get Out of the Valley: Power-Efficient Address Mapping for GPUs
【24h】

Get Out of the Valley: Power-Efficient Address Mapping for GPUs

机译:走出低谷:GPU的节能地址映射

获取原文

摘要

GPU memory systems adopt a multi-dimensional hardware structure to provide the bandwidth necessary to support 100s to 1000s of concurrent threads. On the software side, GPU-compute workloads also use multi-dimensional structures to organize the threads. We observe that these structures can combine unfavorably and create significant resource imbalance in the memory subsystem - causing low performance and poor power-efficiency. The key issue is that it is highly application-dependent which memory address bits exhibit high variability. To solve this problem, we first provide an entropy analysis approach tailored for the highly concurrent memory request behavior in GPU-compute workloads. Our window-based entropy metric captures the information content of each address bit of the memory requests that are likely to co-exist in the memory system at runtime. Using this metric, we find that GPU-compute workloads exhibit entropy valleys distributed throughout the lower order address bits. This indicates that efficient GPU-address mapping schemes need to harvest entropy from broad address-bit ranges and concentrate the entropy into the bits used for channel and bank selection in the memory subsystem. This insight leads us to propose the Page Address Entropy (PAE) mapping scheme which concentrates the entropy of the row, channel and bank bits of the input address into the bank and channel bits of the output address. PAE maps straightforwardly to hardware and can be implemented with a tree of XOR-gates. PAE improves performance by 1.31X and power-efficiency by 1.25X compared to state-of-the-art permutation-based address mapping.
机译:GPU内存系统采用多维硬件结构,以提供支持100到1000s并发线程所需的带宽。在软件方面,GPU计算工作负载还使用多维结构来组织线程。我们观察到,这些结构可能会不利地组合在一起,并在内存子系统中造成严重的资源失衡,从而导致性能低下和电源效率低下。关键问题是哪个存储地址位表现出高度的可变性与应用程序高度相关。为了解决这个问题,我们首先提供一种针对GPU计算工作负载中的高度并发内存请求行为量身定制的熵分析方法。我们基于窗口的熵度量可捕获可能在运行时共存于内存系统中的内存请求的每个地址位的信息内容。使用此度量,我们发现GPU计算的工作负载表现出分​​布在整个低阶地址位中的熵谷。这表明有效的GPU地址映射方案需要从较宽的地址位范围内收集熵,并将熵集中到内存子系统中用于通道和存储体选择的位中。这种见解使我们提出了页面地址熵(PAE)映射方案,该方案将输入地址的行,通道和存储体位的熵集中到输出地址的存储体和通道位中。 PAE可以直接映射到硬件,并且可以用XOR门树来实现。与最新的基于置换的地址映射相比,PAE的性能提高了1.31倍,功率效率提高了1.25倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号