GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution

机译：GPUPD：使用合作投影和分发的快速和可扩展的多GPU架构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Unit (GPU) vendors have been scaling singleGPU architectures to satisfy the ever-increasing user demands for faster graphics processing. However, as it gets extremely difficult to further scale single-GPU architectures, the vendors are aiming to achieve the scaled performance by simultaneously using multiple GPUs connected with newly developed, fast inter-GPU networks (e.g., NVIDIA NVLink, AMD XDMA). With fast inter-GPU networks, it is now promising to employ split frame rendering (SFR) which improves both frame rate and single-frame latency by assigning disjoint regions of a frame to different GPUs. Unfortunately, the scalability of current SFR implementations is seriously limited as they suffer from a large amount of redundant computation among GPUs. This paper proposes GPUpd, a novel multi-GPU architecture for fast and scalable SFR. With small hardware extensions, GPUpd introduces a new graphics pipeline stage called Cooperative Projection & Distribution (C-PD) where all GPUs cooperatively project 3D objects to 2D screen and efficiently redistribute the objects to their corresponding GPUs. C-PD not only eliminates the redundant computation among GPUs, but also incurs minimal inter-GPU network traffic by transferring object IDs instead of mid-pipeline outcomes between GPUs. To further reduce the redistribution overheads, GPUpd minimizes inter-GPU synchronizations by implementing batching and runahead-execution of draw commands. Our detailed cycle-level simulations with 8 real-world game traces show that GPUpd achieves a geomean speedup of 4.98× in single-frame latency with 16 GPUs, whereas the current SFR implementations achieve only 3.07× geomean speedup which saturates on 4 or more GPUs.

机译：图形处理单元（GPU）的供应商已缩放singleGPU架构来满足更快的图形处理日益增加的用户需求。然而，因为它得到进一步规模单GPU架构非常困难的，该供应商旨在实现通过同时使用与新开发的，快GPU间网络（例如，NVIDIA NVLink，AMD XDMA）连接多个GPU缩放性能。用快速GPU间网络，现在有希望采用分割帧渲染（SFR），其通过以不同的GPU分配帧的不相交的区域改善了帧速率和单帧延迟。不幸的是，他们从大量的GPU之间的冗余计算的当前遭受SFR实现的可扩展性受到严重限制。本文提出GPUpd，一种新型的多GPU架构，快速和可扩展SFR。用小的硬件扩展，GPUpd引入称为合作投影和配电设备（C-PD）的新图形流水线阶段，所有的GPU协同地突出3D对象到2D屏幕和有效地重新分配对象到其对应的GPU。 C-PD不仅消除了图形处理器之间的冗余计算，但也导致最少的GPU之间的网络流量通过转移GPU之间的对象ID，而不是中间管道的结果。为了进一步减少再分配开销，GPUpd通过实施配料和绘图命令的超前运行执行最小化GPU之间的同步。我们的详细周期级仿真8真实世界的游戏跟踪显示GPUpd达到4.98×与16分的GPU单帧延迟一个几何平均值加速，而当前SFR实现仅实现3.07×几何平均值加速其在4个或多个GPU饱和。

著录项

来源
《International Symposium on Microarchitecture》|2017年|xix 825 p. :|共13页
会议地点
作者
Youngsok Kim; Jae-Eon Jo; Hanhwi Jang; Minsoo Rhu; Hanjun Kim; Jangwoo Kim;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
computer graphic equipment; graphics processing units; parallel architectures; parallel processing; rendering (computer graphics);

机译：计算机图形设备;图形处理单元;并行架构;并行处理;渲染（计算机图形）;

相似文献

外文文献
中文文献
专利

1. Bifurcation of velocity distributions in cooperative transport of filaments by fast and slow motors [J] . LiX., LipowskyR., KierfeldJ. Biophysical Journal . 2013,第3期

机译：快，慢电机在长丝协同输送中速度分布的分叉
2. Multi-GPU Acceleration of Branchless Distance Driven Projection and Backprojection for Clinical Helical CT [J] . Mitra Ayan, Politte David G., Whitting Bruce R., Journal of Imaging Science and Technology . 2017,第1期

机译：临床螺旋CT的无分支距离驱动投影和反投影的多GPU加速
3. Modeling plant species distributions under future climates: how fine scale do climate projections need to be? [J] . Franklin J., Davis F. W., Ikegami M., Global change biology . 2013,第2期

机译：模拟未来气候下的植物物种分布：气候预测需要达到多大的规模？
4. GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution [C] . Youngsok Kim, Jae-Eon Jo, Hanhwi Jang, Annual IEEE/ACM International Symposium on Microarchitecture . 2017

机译：GPUpd：使用协作投影和分发的快速且可扩展的多GPU架构
5. THE MODULAR EFFERENT ORGANIZATION OF THE INFERIOR PARIETAL LOBULE AND CAUDAL PRINCIPAL SULCUS FOR THEIR CALLOSAL AND RECIPROCAL ASSOCIATION PROJECTIONS: COMPUTER-ASSISTED MAPPING OF THE TANGENTIAL AND RADIAL DISTRIBUTIONS OF HORSERADISH PEROXIDASE/TETRAMETHYLBENZIDINE (HRP/TMB)-LABELLED PROJECTION NEURONS IN HOMOTYPIC NEOCORTEX OF THE RHESUS MONKEY. [D] . LANE, JAMES KENT. 1983

机译：下颌骨小叶和尾状主龈的模块化相似组织，它们的整体和对立缔合投影：辣根过氧化物酶/蛋白质肽/蛋白质的切线和径向分布的计算机辅助映射RHESUS MONKEY。
6. Bifurcation of Velocity Distributions in Cooperative Transport of Filaments by Fast and Slow Motors [O] . Xin Li, Reinhard Lipowsky, Jan Kierfeld 2013

机译：快慢电动机在长丝协同运输中速度分布的分叉
7. Research on Multi-CPU and Multi-GPU Scalable Parallel Rendering on Shared Memory Architecture [O] . Huahai Liu, Pan Wang, Sikun Li, 2012

机译：共享内存架构的多CPU和多GPU可扩展并行渲染的研究

GPUpd: A Fast and Scalable Multi-GPU Architecture Using Cooperative Projection and Distribution

摘要

著录项

相似文献

相关主题

期刊订阅