首页> 外文会议>International Symposium on Microarchitecture >GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution
【24h】

GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution

机译:GPUPD:使用合作投影和分发的快速和可扩展的多GPU架构

获取原文

摘要

Graphics Processing Unit (GPU) vendors have been scaling singleGPU architectures to satisfy the ever-increasing user demands for faster graphics processing. However, as it gets extremely difficult to further scale single-GPU architectures, the vendors are aiming to achieve the scaled performance by simultaneously using multiple GPUs connected with newly developed, fast inter-GPU networks (e.g., NVIDIA NVLink, AMD XDMA). With fast inter-GPU networks, it is now promising to employ split frame rendering (SFR) which improves both frame rate and single-frame latency by assigning disjoint regions of a frame to different GPUs. Unfortunately, the scalability of current SFR implementations is seriously limited as they suffer from a large amount of redundant computation among GPUs. This paper proposes GPUpd, a novel multi-GPU architecture for fast and scalable SFR. With small hardware extensions, GPUpd introduces a new graphics pipeline stage called Cooperative Projection & Distribution (C-PD) where all GPUs cooperatively project 3D objects to 2D screen and efficiently redistribute the objects to their corresponding GPUs. C-PD not only eliminates the redundant computation among GPUs, but also incurs minimal inter-GPU network traffic by transferring object IDs instead of mid-pipeline outcomes between GPUs. To further reduce the redistribution overheads, GPUpd minimizes inter-GPU synchronizations by implementing batching and runahead-execution of draw commands. Our detailed cycle-level simulations with 8 real-world game traces show that GPUpd achieves a geomean speedup of 4.98× in single-frame latency with 16 GPUs, whereas the current SFR implementations achieve only 3.07× geomean speedup which saturates on 4 or more GPUs.
机译:图形处理单元(GPU)的供应商已缩放singleGPU架构来满足更快的图形处理日益增加的用户需求。然而,因为它得到进一步规模单GPU架构非常困难的,该供应商旨在实现通过同时使用与新开发的,快GPU间网络(例如,NVIDIA NVLink,AMD XDMA)连接多个GPU缩放性能。用快速GPU间网络,现在有希望采用分割帧渲染(SFR),其通过以不同的GPU分配帧的不相交的区域改善了帧速率和单帧延迟。不幸的是,他们从大量的GPU之间的冗余计算的当前遭受SFR实现的可扩展性受到严重限制。本文提出GPUpd,一种新型的多GPU架构,快速和可扩展SFR。用小的硬件扩展,GPUpd引入称为合作投影和配电设备(C-PD)的新图形流水线阶段,所有的GPU协同地突出3D对象到2D屏幕和有效地重新分配对象到其对应的GPU。 C-PD不仅消除了图形处理器之间的冗余计算,但也导致最少的GPU之间的网络流量通过转移GPU之间的对象ID,而不是中间管道的结果。为了进一步减少再分配开销,GPUpd通过实施配料和绘图命令的超前运行执行最小化GPU之间的同步。我们的详细周期级仿真8真实世界的游戏跟踪显示GPUpd达到4.98×与16分的GPU单帧延迟一个几何平均值加速,而当前SFR实现仅实现3.07×几何平均值加速其在4个或多个GPU饱和。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号