首页> 外文会议>24th ACM international conference on supercomputing 2010 >Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping
【24h】

Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping

机译:快速优化GPU应用程序:通过运行时线程数据重新映射消除线程发散

获取原文
获取原文并翻译 | 示例

摘要

Because of their tremendous computing power and remarkable cost efficiency, GPUs (graphic processing unit) have quickly emerged as a kind of influential platform for high performance computing. However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application. On a conditional branch where threads diverge in which path to take, the threads taking different paths have to run serially. Such divergences often cause serious performance degradations, impairing the adoption of GPU for many applications that contain non-trivial branches or certain types of loops.rnThis paper presents a systematic investigation in the employment of runtime thread-data remapping for solving that problem. It introduces an abstract form of GPU applications, based on which, it describes the use of reference redirection and data layout transformation for remapping data and threads to minimize thread divergences. It discusses the major challenges for practical deployment of the remapping techniques, most notably, the conflict between the large remapping overhead and the need for the remapping to happen on the fly because of the dependence of thread divergences on runtime values. It offers a solution to the challenge by proposing a CPU-GPU pipelining scheme and a label-assign-move (LAM) algorithm to virtually hide all the remapping overhead. At the end, it reports significant performance improvement produced by the remapping for a set of GPU applications, demonstrating the potential of the techniques for streamlining GPU applications on thernfly.
机译:由于GPU(图形处理单元)的强大计算能力和非凡的成本效率,已迅速成为一种有影响力的高性能计算平台。但是,由于GPU是为海量数据并行计算而设计的,因此其性能取决于GPU应用程序中条件语句的存在。在条件分支中,线程在哪条路径中分叉,采取不同路径的线程必须串行运行。这种差异通常会导致严重的性能下降,从而损害了许多包含非平凡分支或某些类型循环的应用程序对GPU的采用。本文针对使用运行时线程数据重映射来解决该问题进行了系统的研究。它介绍了GPU应用程序的抽象形式,在此基础上,它描述了使用引用重定向和数据布局转换来重新映射数据和线程以最大程度地减少线程差异。它讨论了重新映射技术的实际部署所面临的主要挑战,最值得注意的是,大的重新映射开销与由于线程散度对运行时值的依赖而需要即时进行重新映射之间的冲突。它提出了CPU-GPU流水线方案和标签分配移动(LAM)算法,以隐藏所有重新映射的开销,从而为这一挑战提供了解决方案。最后,它报告了通过对一组GPU应用程序进行重新映射而产生的显着性能改进,表明了在Therfly上简化GPU应用程序的技术的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号