Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems

机译：Griffin：硬件和软件支持，用于在多GPU系统中进行有效的页面迁移

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As transistor scaling becomes increasingly more difficult to achieve, scaling the core count on a single GPU chip has also become extremely challenging. As the volume of data to process in today's increasingly parallel workloads continues to grow unbounded, we need to find scalable solutions that can keep up with this increasing demand. To meet the need of modern-day parallel applications, multi-GPU systems offer a promising path to deliver high performance and large memory capacity. However, multi-GPU systems suffer from performance issues associated with GPU-to-GPU communication and data sharing, which severely impact the benefits of multi-GPU systems. Programming multi-GPU systems has been made considerably simpler with the advent of Unified Memory which enables runtime migration of pages to the GPU on demand. Current multi-GPU systems rely on a first-touch Demand Paging scheme, where memory pages are migrated from the CPU to the GPU on the first GPU access to a page. The data sharing nature of GPU applications makes deploying an efficient programmer-transparent mechanism for inter-GPU page migration challenging. Therefore following the initial CPU-to-GPU page migration, the page is pinned on that GPU. Future accesses to this page from other GPUs happen at a cache-line granularity – pages are not transferred between GPUs without significant programmer intervention. We observe that this mechanism suffers from two major drawbacks: 1) imbalance in the page distribution across multiple GPUs, and 2) inability to move the page to the GPU that uses it most frequently. Both of these problems lead to load imbalance across GPUs, degrading the performance of the multi-GPU system. To address these problems, we propose Griffin, a holistic hardware-software solution to improve the performance of NUMA multi-GPU systems. Griffin introduces programmer-transparent modifications to both the IOMMU and GPU architecture, supporting efficient runtime page migration based on locality information. In particular, Griffin employs a novel mechanism to detect and move pages at runtime between GPUs, increasing the frequency of resolving accesses locally, which in turn improves the performance. To ensure better load balancing across GPUs, Griffin employs a Delayed First-Touch Migration policy that ensures pages are evenly distributed across multiple GPUs. Our results on a diverse set of multi-GPU workloads show that Griffin can achieve up to a 2.9× speedup on a multi-GPU system, while incurring low implementation overhead.

机译：随着晶体管缩放越来越难以实现，在单个GPU芯片上缩放内核数量也变得极具挑战性。随着当今越来越多的并行工作负载中要处理的数据量不断增长，我们需要找到可满足这种不断增长的需求的可伸缩解决方案。为了满足现代并行应用程序的需求，多GPU系统为提供高性能和大内存容量提供了一条有希望的途径。但是，多GPU系统遭受与GPU到GPU的通信和数据共享相关的性能问题，这严重影响了多GPU系统的优势。统一内存的出现使对多GPU系统的编程变得非常简单，它可以按需将页面在运行时迁移到GPU。当前的多GPU系统依赖于“第一触摸式需求分页”方案，该存储页是在第一次访问页面的GPU上将存储页面从CPU迁移到GPU的。 GPU应用程序的数据共享特性使部署高效的程序员透明机制对GPU之间的页面迁移具有挑战性。因此，在最初的CPU到GPU页面迁移之后，页面被固定在该GPU上。将来从其他GPU对该页面的访问以高速缓存行粒度进行-在没有大量程序员干预的情况下，页面不会在GPU之间传输。我们观察到该机制有两个主要缺点：1）跨多个GPU的页面分布不平衡，以及2）无法将页面移动到使用频率最高的GPU。这两个问题都导致跨GPU的负载不平衡，从而降低了多GPU系统的性能。为了解决这些问题，我们建议使用Griffin，这是一种整体的软硬件解决方案，可以提高NUMA多GPU系统的性能。 Griffin对IOMMU和GPU架构都进行了程序员透明的修改，从而支持基于位置信息的高效运行时页面迁移。特别是，Griffin采用了一种新颖的机制来在运行时在GPU之间检测和移动页面，从而增加了本地解决访问的频率，从而提高了性能。为了确保跨GPU更好的负载平衡，Griffin采用了“延迟的第一触式迁移”策略，以确保页面均匀地分布在多个GPU上。我们对各种不同的多GPU工作负载的结果表明，Griffin可以在多GPU系统上实现高达2.9倍的加速，同时实现较低的实现开销。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2020年|596-609|共14页
会议地点
作者
Trinayan Baruah; Yifan Sun; Ali Tolga Dinçer; Saiful A. Mojumder; José L. Abellán; Yash Ukidave; Ajay Joshi; Norman Rubin; John Kim; David Kaeli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Programming; Memory management; Hardware; Runtime; Pipelines; Fabrics;

机译：图形处理单元;编程;内存管理;硬件;运行时;管道;织物;

相似文献

外文文献
中文文献
专利

1. An efficient scheme for multi-GPU TTI reverse time migration [J] . Liu Guo-Feng, Meng Xiao-Hong, Yu Zhen-Jiang, 应用地球物理（英文版） . 2019,第001期

机译：多GPU TTI反向时间迁移的有效方案
2. Breadth First Search on Cost-efficient Multi-GPU Systems [J] . Takuji Mitsuishi, Jun Suzuki, Yuki Hayashi, Computer architecture news . 2015,第4期

机译：经济高效的多GPU系统的广度优先搜索
3. Efficient magnetohydrodynamic simulations on distributed multi-GPU systems using a novel GPU Direct-MPI hybrid approach [J] . Un-Hong Wong, Takayuki Aoki, Hon-Cheng Wong Computer physics communications . 2014,第7期

机译：使用新颖的GPU Direct-MPI混合方法在分布式多GPU系统上进行有效的磁流体动力学模拟
4. Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems [C] . Trinayan Baruah, Yifan Sun, Ali Tolga Din?er, IEEE International Symposium on High Performance Computer Architecture . 2020

机译：Griffin：硬件 - 软件支持多GPU系统中的高效页面迁移
5. Reusing migration to simply and efficiently implement multi-server operations in transparently scalable storage systems. [D] . Sinnamohideen, Shafeeq. 2010

机译：重用迁移以在透明可伸缩的存储系统中简单有效地实现多服务器操作。
6. GRIFFIN: a system for predicting GPCR–G-protein coupling selectivity using a support vector machine and a hidden Markov model [O] . Yukimitsu Yabuki, Takahiko Muramatsu, Takatsugu Hirokawa, 2005

机译：GRIFFIN：使用支持向量机和隐马尔可夫模型预测GPCR-G蛋白偶联选择性的系统
7. Efficient Support for Matrix Computations on Heterogeneous Multi-core and Multi-GPU Architectures ∗ [O] . Fengguang Song, Stanimire Tomov, Jack Dongarra 2012

机译：对异构多核和多GpU架构的矩阵计算的有效支持*

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems

摘要

著录项

相似文献

相关主题

期刊订阅