...
首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >On-GPU Thread-Data Remapping for Branch Divergence Reduction
【24h】

On-GPU Thread-Data Remapping for Branch Divergence Reduction

机译:用于分支发散的GPU线程数据重新传唤

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

General Purpose GPU computing (GPGPU) plays an increasingly vital role in high performance computing and other areas like deep learning. However, arising from the SIMD execution model, the branch divergence issue lowers efficiency of conditional branching on GPUs, and hinders the development of GPGPU. To achieve runtime on-the-spot branch divergence reduction, we propose the first on-GPU thread-data remapping scheme. Before kernel launching, our solution inserts codes into GPU kernels immediately before each target branch so as to acquire actual runtime divergence information. GPU software threads can be remapped to datasets multiple times during single kernel execution. We propose two thread-data remapping algorithms that are tailored to the GPU architecture. Effective on two generations of GPUs from both NVIDIA and AMD, our solution achieves speedups up to 2.718 with third-party benchmarks. We also implement three GPGPU frontier benchmarks from areas including computer vision, algorithmic trading and data analytics. They are hindered by more complex divergence coupled with different memory access patterns, and our solution works better than the traditional thread-data remapping scheme in all cases. As a compiler-assisted runtime solution, it can better reduce divergence for divergent applications that gain little acceleration on GPUs for the time being.
机译:通用GPU计算(GPGPU)在高性能计算和深度学习中的其他领域起着越来越重要的作用。然而,由SIMD执行模型引起,分支发出问题降低了GPU上有条件分支的效率,并阻碍了GPGPU的发展。为了实现运行时的现场分支分支,我们提出了第一个ON-GPU线程数据重新映射方案。在内核启动之前,我们的解决方案在每个目标分支之前立即将代码插入到GPU内核中,以便获取实际运行时发散信息。在单个内核执行期间,可以多次重新映射GPU软件线程。我们提出了两个针对GPU架构量身定制的线程数据重新映射算法。从NVIDIA和AMD的两代GPU有效,我们的解决方案通过第三方基准实现了高达2.718的加速。我们还实现了来自包括计算机视觉,算法交易和数据分析的区域的三个GPGPU前沿基准。它们因与不同的内存访问模式而耦合的更复杂的发散而受阻,我们的解决方案优于所有情况中的传统线程数据重新映射方案。作为编译器辅助运行时解决方案,可以更好地减少分歧的不同应用程序,即暂时增加GPU的加速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号