Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

机译：透明卸载和映射（TOM）：在GPU系统中启用程序员透明的近数据处理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to significantly alleviate this bottleneck by directly connecting a logic layer to the DRAM layers with high bandwidth connections. Recent work has shown promising potential performance benefits from an architecture that connects multiple such 3D-stacked memories and offloads bandwidth-intensive computations to a GPU in each of the logic layers. An unsolved key challenge in such a system is how to enable computation offloading and data mapping to multiple 3D-stacked memories without burdening the programmer such that any application can transparently benefit from near-data processing capabilities in the logic layer. Our paper develops two new mechanisms to address this key challenge. First, a compiler-based technique that automatically identifies code to offload to a logic-layer GPU based on a simple cost-benefit analysis. Second, a software/hardware cooperative mechanism that predicts which memory pages will be accessed by offloaded code, and places those pages in the memory stack closest to the offloaded code, to minimize off-chip bandwidth consumption. We call the combination of these two programmer-transparent mechanisms TOM: Transparent Offloading and Mapping. Our extensive evaluations across a variety of modern memory-intensive GPU workloads show that, without requiring any program modification, TOM significantly improves performance (by 30% on average, and up to 76%) compared to a baseline GPU system that cannot offload computation to 3D-stacked memories.

机译：由于有限的片外引脚带宽，主存储器带宽是现代GPU系统的关键瓶颈。通过将逻辑层直接连接到具有高带宽连接的DRAM层，3D堆叠的内存体系结构为显着缓解这一瓶颈提供了一个有前途的机会。最近的工作表明，通过将多个3D堆栈式存储器连接起来并将带宽密集型计算卸载到每个逻辑层中的GPU的体系结构，潜在的潜在性能优势将得到体现。在这样的系统中，一个尚未解决的关键挑战是如何在不增加程序员负担的情况下实现对多个3D堆栈存储器的计算分载和数据映射，从而使任何应用程序都可以从逻辑层中的近数据处理功能中透明受益。我们的论文开发了两种新的机制来应对这一关键挑战。首先，基于编译器的技术可基于简单的成本效益分析自动识别要卸载到逻辑层GPU的代码。其次，一种软件/硬件协作机制可以预测卸载的代码将访问哪些内存页面，并将这些页面放置在与卸载的代码最接近的内存堆栈中，以最大程度地降低片外带宽消耗。我们将这两种程序员透明机制的组合称为TOM：透明卸载和映射。我们对各种现代内存密集型GPU工作负载进行的广泛评估表明，与无法将计算分流到的基准GPU系统相比，无需进行任何程序修改，TOM即可显着提高性能（平均提高30％，最高可达76％）。 3D堆叠的记忆。

著录项

来源
《ACM/IEEE Annual International Symposium on Computer Architecture》|2016年|204-216|共13页
会议地点
作者
Kevin Hsieh; Eiman Ebrahim; Gwangsun Kim; Niladrish Chatterjee; Mike OConnor; Nandita Vijaykumar; Onur Mutlu; Stephen W. Keckler;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Bandwidth; Memory management; Mathematical model; Three-dimensional displays; Runtime; Message systems;

机译：图形处理单元;带宽;内存管理;数学模型;三维显示;运行时;消息系统;

相似文献

外文文献
中文文献
专利

1. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems [J] . Kevin Hsieh, Eiman Ebrahimi, Gwangsun Kim, Computer architecture news . 2016,第3期

机译：透明卸载和映射（TOM）：在GPU系统中启用程序员透明的近数据处理
2. Programmer-transparent coordination of recovering concurrent processes: philosophy and rules for efficient implementation [J] . Kim K.H. IEEE Transactions on Software Engineering . 1988,第6期

机译：恢复并发过程的程序员透明协调：有效实施的理念和规则
3. Study of parallel processing area extraction and data transfer number reduction for automatic GPU offloading of IoT applications [J] . Yamato Yoji Journal of Intelligent Information Systems . 2020,第3期

机译：SINAL ACPLACE的并行处理区域提取和数据传输数减少的研究
4. Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems [C] . Kevin Hsieh, Eiman Ebrahim, Gwangsun Kim, ACM/IEEE Annual International Symposium on Computer Architecture . 2016

机译：透明的卸载和映射（TOM）：在GPU系统中启用程序员 - 透明的近数据处理
5. Runtime Support toward Transparent Memory Access in GPU-accelerated Heterogeneous Systems. [D] . Ji, Feng. 2013

机译：在GPU加速的异构系统中实现对透明内存访问的运行时支持。
6. The MetabolomeExpress Project: enabling web-based processing analysis and transparent dissemination of GC/MS metabolomics datasets [O] . Adam J Carroll, Murray R Badger, A Harvey Millar 2010

机译：MetabolomeExpress项目：支持基于Web的GC / MS代谢组学数据集处理分析和透明分发
7. Graphics Processor Unit (GPU) Accelerated Shallow Transparent Layer Detection in Optical Coherence Tomographic (OCT) images for real-time Corneal Surgical Guidance [O] . Mathai Tejas Sudharshan, Galeotti John, Horvath Samantha, 2014

机译：光学相干断层扫描（OCT）图像中的图形处理器单元（GPU）加速浅层透明层检测，用于实时角膜手术指导
8. Automated Seat Cushion for Pressure Ulcer Prevention Using Real-Time Mapping, Offloading, and Redistribution of Interface Pressure. [R] . Wijesundara, M., Cooper, R. 2016

机译：使用实时映射，卸载和接口压力重新分配的压力溃疡预防自动座垫。

Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems

摘要

著录项

相似文献

相关主题

期刊订阅