...
首页> 外文期刊>Journal of Parallel and Distributed Computing >On-GPU thread-data remapping for nested branch divergence
【24h】

On-GPU thread-data remapping for nested branch divergence

机译:嵌套分支发散的On-GPU线程数据重新映射

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Nested branches are common in applications with decision trees. The more layers in the branch nest, the larger slowdown is caused by nested branch divergence on GPU. Since inner branches are impractical to evaluate on host end, thread-data remapping via GPU shared memory is so far the most suitable solution. However, existing solution cannot handle inner branches directly due to undefined behavior of GPU barrier function when executed within branch statements. Race condition needs to be prevented without using barrier function. Targeting nested divergence, we propose NeX as a nested extension scheme featuring an inter-thread protocol that supports sub-workgroup synchronization. We further exploit the on-the-fly nature of Head-or-Tail (HoT) algorithm and propose HoT2 with enhanced flexibility of wavefront scheduling. Evaluated on four GPU models including NVIDIA Volta and Turing, HoT2 confirms to be more efficient. For benchmarks with branch nests up to five-layer-deep, NeX further boosts performance by up to 1.56x.
机译:嵌套分支在决策树的应用中很常见。分支巢中的层数越多,较大的放缓是由GPU上的嵌套分支发散引起的。由于内部分支对主机结束进行评估是不切实际的,因此通过GPU共享存储器的线程数据重新映射到目前为止是最合适的解决方案。但是,由于在分支语句中执行时,现有解决方案无法直接处理内部分支,因为GPU屏障函数的未定义行为。在不使用屏障功能的情况下需要防止种族条件。针对嵌套分流,我们将NEX提出为嵌套的扩展方案,其中包含一个支持子工作组同步的线程间协议。我们进一步利用了头部或尾部(热)算法的禁用性,并提出了Wavefront调度的灵活性。在包括NVIDIA VOLTA和图灵的四种GPU模型中进行评估,HOT2确认更有效。对于嵌套高达五层深度的支架,NEX进一步提高了最高1.56倍的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号