首页> 外文会议>ACM international conference on supercomputing >Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping
【24h】

Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping

机译:简化GPU应用程序:通过运行时线程重新映射通过运行时的线程消除

获取原文

摘要

Because of their tremendous computing power and remarkable cost efficiency, GPUs (graphic processing unit) have quickly emerged as a kind of influential platform for high performance computing. However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application. On a conditional branch where threads diverge in which path to take, the threads taking different paths have to run serially. Such divergences often cause serious performance degradations, impairing the adoption of GPU for many applications that contain non-trivial branches or certain types of loops. This paper presents a systematic investigation in the employment of runtime thread-data remapping for solving that problem. It introduces an abstract form of GPU applications, based on which, it describes the use of reference redirection and data layout transformation for remapping data and threads to minimize thread divergences. It discusses the major challenges for practical deployment of the remapping techniques, most notably, the conflict between the large remapping overhead and the need for the remapping to happen on the fly because of the dependence of thread divergences on runtime values. It offers a solution to the challenge by proposing a CPU-GPU pipelining scheme and a label-assign-move (LAM) algorithm to virtually hide all the remapping overhead. At the end, it reports significant performance improvement produced by the remapping for a set of GPU applications, demonstrating the potential of the techniques for streamlining GPU applications on the fly.
机译:由于其巨大的计算能力和显着的成本效率,GPU(图形处理单元)很快被出现为高性能计算的一种有影响力的平台。然而,由于GPU被设计用于大规模的数据并行计算,因此它们的性能受GPU应用中存在条件语句的影响。在条件分支上,其中线索在哪个路径涉及到其中的路径中,采用不同路径的线程必须串行运行。这种分歧经常导致严重的性能下降,损害GPU的许多应用程序的采用,这些应用程序包含非琐碎的分支或某些类型的循环。本文在采用运行时间线程数据重新绘制以解决该问题的系统调查。它介绍了一种抽象的GPU应用程序,基于该应用,它描述了使用参考重定向和数据布局转换来重新映射数据和线程以最小化线程分流。它讨论了实际部署重复技术的主要挑战,最重要的是,由于线程分流对运行时值的依赖性,大规模重新映射开销与Remappation发生重新映射之间的冲突。它通过提出CPU-GPU管道方案和标签分配 - 移动(LAM)算法来提供挑战的解决方案,以实际隐藏所有重新映射开销。最后,它报告了一组GPU应用程序的重新映射产生了显着的性能改进,展示了在飞行中简化GPU应用的技术的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号