首页> 外文会议>ACM/IEEE Annual International Symposium on Computer Architecture >Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems
【24h】

Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

机译:透明卸载和映射(TOM):在GPU系统中启用程序员透明的近数据处理

获取原文

摘要

Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation offloading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer. Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to offload to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by offloaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Offloading and Mapping. Our extensive evaluations across a variety of modern memory-intensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.
机译:由于有限的片外引脚带宽,主存储器带宽是现代GPU系统的关键瓶颈。通过将逻辑层直接连接到具有高带宽连接的DRAM层,3D堆叠的内存体系结构为显着缓解这一瓶颈提供了一个有前途的机会。最近的工作表明,通过将多个3D堆栈式存储器连接起来并将带宽密集型计算卸载到每个逻辑层中的GPU的体系结构,潜在的潜在性能优势将得到体现。在这样的系统中,一个尚未解决的关键挑战是如何在不增加程序员负担的情况下实现对多个3D堆栈存储器的计算分载和数据映射,从而使任何应用程序都可以从逻辑层中的近数据处理功能中透明受益。我们的论文开发了两种新的机制来应对这一关键挑战。首先,基于编译器的技术可基于简单的成本效益分析自动识别要卸载到逻辑层GPU的代码。其次,一种软件/硬件协作机制可以预测卸载的代码将访问哪些内存页面,并将这些页面放置在与卸载的代码最接近的内存堆栈中,以最大程度地降低片外带宽消耗。我们将这两种程序员透明机制的组合称为TOM:透明卸载和映射。我们对各种现代内存密集型GPU工作负载进行的广泛评估表明,与无法将计算分流到的基准GPU系统相比,无需进行任何程序修改,TOM即可显着提高性能(平均提高30%,最高可达76%)。 3D堆叠的记忆。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号