首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems
【24h】

Griffin: Hardware-Software Support for Efficient Page Migration in Multi-GPU Systems

机译:Griffin:硬件和软件支持,用于在多GPU系统中进行有效的页面迁移

获取原文

摘要

As transistor scaling becomes increasingly more difficult to achieve, scaling the core count on a single GPU chip has also become extremely challenging. As the volume of data to process in today's increasingly parallel workloads continues to grow unbounded, we need to find scalable solutions that can keep up with this increasing demand. To meet the need of modern-day parallel applications, multi-GPU systems offer a promising path to deliver high performance and large memory capacity. However, multi-GPU systems suffer from performance issues associated with GPU-to-GPU communication and data sharing, which severely impact the benefits of multi-GPU systems. Programming multi-GPU systems has been made considerably simpler with the advent of Unified Memory which enables runtime migration of pages to the GPU on demand. Current multi-GPU systems rely on a first-touch Demand Paging scheme, where memory pages are migrated from the CPU to the GPU on the first GPU access to a page. The data sharing nature of GPU applications makes deploying an efficient programmer-transparent mechanism for inter-GPU page migration challenging. Therefore following the initial CPU-to-GPU page migration, the page is pinned on that GPU. Future accesses to this page from other GPUs happen at a cache-line granularity – pages are not transferred between GPUs without significant programmer intervention. We observe that this mechanism suffers from two major drawbacks: 1) imbalance in the page distribution across multiple GPUs, and 2) inability to move the page to the GPU that uses it most frequently. Both of these problems lead to load imbalance across GPUs, degrading the performance of the multi-GPU system. To address these problems, we propose Griffin, a holistic hardware-software solution to improve the performance of NUMA multi-GPU systems. Griffin introduces programmer-transparent modifications to both the IOMMU and GPU architecture, supporting efficient runtime page migration based on locality information. In particular, Griffin employs a novel mechanism to detect and move pages at runtime between GPUs, increasing the frequency of resolving accesses locally, which in turn improves the performance. To ensure better load balancing across GPUs, Griffin employs a Delayed First-Touch Migration policy that ensures pages are evenly distributed across multiple GPUs. Our results on a diverse set of multi-GPU workloads show that Griffin can achieve up to a 2.9× speedup on a multi-GPU system, while incurring low implementation overhead.
机译:随着晶体管缩放越来越难以实现,在单个GPU芯片上缩放内核数量也变得极具挑战性。随着当今越来越多的并行工作负载中要处理的数据量不断增长,我们需要找到可满足这种不断增长的需求的可伸缩解决方案。为了满足现代并行应用程序的需求,多GPU系统为提供高性能和大内存容量提供了一条有希望的途径。但是,多GPU系统遭受与GPU到GPU的通信和数据共享相关的性能问题,这严重影响了多GPU系统的优势。统一内存的出现使对多GPU系统的编程变得非常简单,它可以按需将页面在运行时迁移到GPU。当前的多GPU系统依赖于“第一触摸式需求分页”方案,该存储页是在第一次访问页面的GPU上将存储页面从CPU迁移到GPU的。 GPU应用程序的数据共享特性使部署高效的程序员透明机制对GPU之间的页面迁移具有挑战性。因此,在最初的CPU到GPU页面迁移之后,页面被固定在该GPU上。将来从其他GPU对该页面的访问以高速缓存行粒度进行-在没有大量程序员干预的情况下,页面不会在GPU之间传输。我们观察到该机制有两个主要缺点:1)跨多个GPU的页面分布不平衡,以及2)无法将页面移动到使用频率最高的GPU。这两个问题都导致跨GPU的负载不平衡,从而降低了多GPU系统的性能。为了解决这些问题,我们建议使用Griffin,这是一种整体的软硬件解决方案,可以提高NUMA多GPU系统的性能。 Griffin对IOMMU和GPU架构都进行了程序员透明的修改,从而支持基于位置信息的高效运行时页面迁移。特别是,Griffin采用了一种新颖的机制来在运行时在GPU之间检测和移动页面,从而增加了本地解决访问的频率,从而提高了性能。为了确保跨GPU更好的负载平衡,Griffin采用了“延迟的第一触式迁移”策略,以确保页面均匀地分布在多个GPU上。我们对各种不同的多GPU工作负载的结果表明,Griffin可以在多GPU系统上实现高达2.9倍的加速,同时实现较低的实现开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号