Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping

机译：简化GPU应用程序：通过运行时线程重新映射通过运行时的线程消除

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Because of their tremendous computing power and remarkable cost efficiency, GPUs (graphic processing unit) have quickly emerged as a kind of influential platform for high performance computing. However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application. On a conditional branch where threads diverge in which path to take, the threads taking different paths have to run serially. Such divergences often cause serious performance degradations, impairing the adoption of GPU for many applications that contain non-trivial branches or certain types of loops. This paper presents a systematic investigation in the employment of runtime thread-data remapping for solving that problem. It introduces an abstract form of GPU applications, based on which, it describes the use of reference redirection and data layout transformation for remapping data and threads to minimize thread divergences. It discusses the major challenges for practical deployment of the remapping techniques, most notably, the conflict between the large remapping overhead and the need for the remapping to happen on the fly because of the dependence of thread divergences on runtime values. It offers a solution to the challenge by proposing a CPU-GPU pipelining scheme and a label-assign-move (LAM) algorithm to virtually hide all the remapping overhead. At the end, it reports significant performance improvement produced by the remapping for a set of GPU applications, demonstrating the potential of the techniques for streamlining GPU applications on the fly.

机译：由于其巨大的计算能力和显着的成本效率，GPU（图形处理单元）很快被出现为高性能计算的一种有影响力的平台。然而，由于GPU被设计用于大规模的数据并行计算，因此它们的性能受GPU应用中存在条件语句的影响。在条件分支上，其中线索在哪个路径涉及到其中的路径中，采用不同路径的线程必须串行运行。这种分歧经常导致严重的性能下降，损害GPU的许多应用程序的采用，这些应用程序包含非琐碎的分支或某些类型的循环。本文在采用运行时间线程数据重新绘制以解决该问题的系统调查。它介绍了一种抽象的GPU应用程序，基于该应用，它描述了使用参考重定向和数据布局转换来重新映射数据和线程以最小化线程分流。它讨论了实际部署重复技术的主要挑战，最重要的是，由于线程分流对运行时值的依赖性，大规模重新映射开销与Remappation发生重新映射之间的冲突。它通过提出CPU-GPU管道方案和标签分配 - 移动（LAM）算法来提供挑战的解决方案，以实际隐藏所有重新映射开销。最后，它报告了一组GPU应用程序的重新映射产生了显着的性能改进，展示了在飞行中简化GPU应用的技术的潜力。

著录项

来源
《ACM international conference on supercomputing》|2010年||共11页
会议地点
作者
Eddy Z. Zhang; Yunlian Jiang; Ziyu Guo; Xipeng Shen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
GPGPU; thread divergence; thread-data remapping; CPU-GPU pipelining; data transformation;

机译：GPGPU;线程分歧;线程重新映射;CPU-GPU流水;数据转换;

相似文献

外文文献
中文文献
专利

1. On-GPU thread-data remapping for nested branch divergence [J] . Huanxin Lin, Cho-Li Wang Journal of Parallel and Distributed Computing . 2020,第May期

机译：嵌套分支发散的On-GPU线程数据重新映射
2. On-GPU Thread-Data Remapping for Branch Divergence Reduction [J] . Lin Huanxin, Wang Cho-Li, Liu Hongyuan ACM Transactions on Architecture and Code Optimization . 2018,第3期

机译：用于分支发散的GPU线程数据重新传唤
3. Efficient low-latency packet processing using On-GPU Thread-Data Remapping [J] . Huanxin Lin, Cho-Li Wang Journal of Parallel and Distributed Computing . 2019,第Nova期

机译：使用On-GPU线程数据重新映射的高效低延迟数据包处理
4. Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping [C] . Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, 24th ACM international conference on supercomputing 2010 . 2010

机译：快速优化GPU应用程序：通过运行时线程数据重新映射消除线程发散
5. On-GPU Thread-Data Remapping for Branch Divergence Reduction [O] . Huanxin Lin, Cho-Li Wang, Hongyuan Liu 2018

机译：用于分支发散的GPU线程数据重新传唤
6. Machining Elimination through Application of Thread Forming Fasteners in Net-Shaped Cast Holes. [R] . Cleaver, R., Cleaver, T., Talbott, R. 2012

机译：在网状铸孔中应用螺纹成形紧固件进行加工消除。

Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping

摘要

著录项

相似文献

相关主题

期刊订阅