首页> 外文会议>IEEE Conference on High Performance Extreme Computing >Algorithm Flattening: Complete branch elimination for GPU requires a paradigm shift from CPU thinking
【24h】

Algorithm Flattening: Complete branch elimination for GPU requires a paradigm shift from CPU thinking

机译:算法扁平化:要完全消除GPU的分支,需要从CPU思维上转变范式

获取原文

摘要

Graphics processing units (GPUs) have inadvertently become supercomputers in and of themselves, to the benefit of applications outside of graphics. Acceleration of multiple orders of magnitude has been achieved in scientific computing, co-processing and the like. However, the Single Instruction Multiple Data (SIMD) design of GPUs is extremely sensitive to thread divergence. So much so that performance improvement from GPUs is all but eviscerated by thread divergence for many applications. This problem has driven general purpose GPU computing in the direction of finding “appropriate” applications to accelerate, rather than accelerating applications with a need for performance improvements. Thread divergence is generally caused by branches. Previous research has addressed the issue of reducing branches, but none of this work aims to entirely eliminate branches, because the methods required for complete branch elimination are a drastic de-optimization for CPU. We present Algorithm Flattening (AF), a de-optimization for CPU which completely removes all branches, and results in a significant optimization for GPU accelerated applications. AF eliminates thread divergence, substantially decreases execution time, allows for the implementation of algorithms on GPU which previously do not fully utilize GPU capability and generates deterministic performance. AF removes branches, replacing them with a reduced equation, and achieves a substantial speedup of already GPU accelerated algorithms and applications. We believe that AF will have a significant impact on high performance computing as it is a long needed solution that allows unprecedented use of GPUs for general purpose applications.
机译:图形处理单元(GPU)本身已无意中成为了超级计算机,从而受益于图形外部的应用程序。在科学计算,协同处理等中已经实现了多个数量级的加速。但是,GPU的单指令多数据(SIMD)设计对线程分歧非常敏感。如此之多以至于GPU的性能提高几乎被许多应用程序的线程分歧所抵消。这个问题已将通用GPU计算推向寻找“合适的”应用程序以加速而不是加速需要性能改进的应用程序的方向。线程分歧通常是由分支引起的。先前的研究已经解决了减少分支的问题,但是这项工作的目的都不是要完全消除分支,因为完全消除分支所需的方法是对CPU的急剧优化。我们提出了算法展平(AF),这是一种针对CPU的去优化技术,它完全消除了所有分支,并为GPU加速的应用程序带来了重大优化。 AF消除了线程分歧,大大减少了执行时间,允许在GPU上实施以前无法充分利用GPU功能并产生确定性性能的算法。 AF删除了分支,用简化的等式替换了它们,并大大提高了GPU加速算法和应用程序的速度。我们相信,自动对焦将对高性能计算产生重大影响,因为它是一个长期需要的解决方案,它允许对通用应用程序进行空前的GPU使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号