On-GPU Thread-Data Remapping for Branch Divergence Reduction

Lin Huanxin; Wang Cho-Li; Liu Hongyuan

首页> 外文期刊>ACM Transactions on Architecture and Code Optimization >On-GPU Thread-Data Remapping for Branch Divergence Reduction

【24h】

On-GPU Thread-Data Remapping for Branch Divergence Reduction

机译：用于分支发散的GPU线程数据重新传唤

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

General Purpose GPU computing (GPGPU) plays an increasingly vital role in high performance computing and other areas like deep learning. However, arising from the SIMD execution model, the branch divergence issue lowers efficiency of conditional branching on GPUs, and hinders the development of GPGPU. To achieve runtime on-the-spot branch divergence reduction, we propose the first on-GPU thread-data remapping scheme. Before kernel launching, our solution inserts codes into GPU kernels immediately before each target branch so as to acquire actual runtime divergence information. GPU software threads can be remapped to datasets multiple times during single kernel execution. We propose two thread-data remapping algorithms that are tailored to the GPU architecture. Effective on two generations of GPUs from both NVIDIA and AMD, our solution achieves speedups up to 2.718 with third-party benchmarks. We also implement three GPGPU frontier benchmarks from areas including computer vision, algorithmic trading and data analytics. They are hindered by more complex divergence coupled with different memory access patterns, and our solution works better than the traditional thread-data remapping scheme in all cases. As a compiler-assisted runtime solution, it can better reduce divergence for divergent applications that gain little acceleration on GPUs for the time being.

机译：通用GPU计算（GPGPU）在高性能计算和深度学习中的其他领域起着越来越重要的作用。然而，由SIMD执行模型引起，分支发出问题降低了GPU上有条件分支的效率，并阻碍了GPGPU的发展。为了实现运行时的现场分支分支，我们提出了第一个ON-GPU线程数据重新映射方案。在内核启动之前，我们的解决方案在每个目标分支之前立即将代码插入到GPU内核中，以便获取实际运行时发散信息。在单个内核执行期间，可以多次重新映射GPU软件线程。我们提出了两个针对GPU架构量身定制的线程数据重新映射算法。从NVIDIA和AMD的两代GPU有效，我们的解决方案通过第三方基准实现了高达2.718的加速。我们还实现了来自包括计算机视觉，算法交易和数据分析的区域的三个GPGPU前沿基准。它们因与不同的内存访问模式而耦合的更复杂的发散而受阻，我们的解决方案优于所有情况中的传统线程数据重新映射方案。作为编译器辅助运行时解决方案，可以更好地减少分歧的不同应用程序，即暂时增加GPU的加速度。

著录项

来源
《ACM Transactions on Architecture and Code Optimization》 |2018年第3期|共24页
作者
Lin Huanxin; Wang Cho-Li; Liu Hongyuan;
展开▼
作者单位

Univ Hong Kong Room 414 Chow Yei Ching Bldg Hong Kong Hong Kong Peoples R China;

Univ Hong Kong Room 414 Chow Yei Ching Bldg Hong Kong Hong Kong Peoples R China;

Univ Hong Kong Room 414 Chow Yei Ching Bldg Hong Kong Hong Kong Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
Parallel computing; GPGPU; SIMD; branch divergence;

机译：Paralgal Computing;Zipgov;Sind;分支分支;

相似文献

外文文献
中文文献
专利

1. On-GPU Thread-Data Remapping for Branch Divergence Reduction [J] . Lin Huanxin, Wang Cho-Li, Liu Hongyuan ACM Transactions on Architecture and Code Optimization . 2018,第3期

机译：用于分支发散的GPU线程数据重新传唤
2. Efficient low-latency packet processing using On-GPU Thread-Data Remapping [J] . Huanxin Lin, Cho-Li Wang Journal of Parallel and Distributed Computing . 2019,第Nova期

机译：使用On-GPU线程数据重新映射的高效低延迟数据包处理
3. Property-dependent reductions adequate with divergence-sensitive branching bisimilarity [J] . Radu Mateescu, Anton Wijs Science of Computer Programming . 2014,第pta3期

机译：取决于性质的归约，具有发散敏感的分支双相似性
4. Streamlining GPU Applications On the Fly: Thread Divergence Elimination through Runtime Thread-Data Remapping [C] . Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, 24th ACM international conference on supercomputing 2010 . 2010

机译：快速优化GPU应用程序：通过运行时线程数据重新映射消除线程发散
5. DRAG REDUCTION AND SOLUTION STUDIES OF ALUMINUM BRANCHED-CHAIN DISOAPS IN TOLUENE [D] . KUO, JEFFREY TSAI-HWA -1

机译：甲苯中铝支链双皂的减阻与固溶研究
6. Duplication and Functional Divergence of Branched-Chain Amino Acid Biosynthesis Genes in Aspergillus nidulans [O] . Joel T. Steyer, Damien J. Downes, Cameron C. Hunter, 2021

机译：叶绿山血红兰分枝氨基酸生物合成基因的重复和功能分歧
7. On-GPU Thread-Data Remapping for Branch Divergence Reduction [O] . Huanxin Lin, Cho-Li Wang, Hongyuan Liu 2018

机译：用于分支发散的GPU线程数据重新传唤
8. Dimensionality Reduction and Information-Theoretic Divergence Between Sets of Ladar Images [R] . Gray, D. M., Principe, J. C. 2008

机译：Ladar图像集之间的维数降阶和信息理论发散

On-GPU Thread-Data Remapping for Branch Divergence Reduction

摘要

著录项

相似文献

相关主题

期刊订阅