【24h】

Optimal loop unrolling for GPGPU programs

机译:GPGPU程序的最佳循环展开

获取原文
获取原文并翻译 | 示例

摘要

Graphics Processing Units (GPUs) are massively parallel, many-core processors with tremendous computational power and very high memory bandwidth. With the advent of general purpose programming models such as NVIDIA's CUDA and the new standard OpenCL, general purpose programming using GPUs (GPGPU) has become very popular. However, the GPU architecture and programming model have brought along with it many new challenges and opportunities for compiler optimizations. One such classical optimization is loop unrolling. Current GPU compilers perform limited loop unrolling. In this paper, we attempt to understand the impact of loop unrolling on GPGPU programs. We develop a semi-automatic, compile-time approach for identifying optimal unroll factors for suitable loops in GPGPU programs. In addition, we propose techniques for reducing the number of unroll factors evaluated, based on the characteristics of the program being compiled and the device being compiled to. We use these techniques to evaluate the effect of loop unrolling on a range of GPGPU programs and show that we correctly identify the optimal unroll factors. The optimized versions run up to 70 percent faster than the unoptimized versions.
机译:图形处理单元(GPU)是大规模并行的多核处理器,具有强大的计算能力和很高的内存带宽。随着诸如NVIDIA CUDA和新标准OpenCL之类的通用编程模型的出现,使用GPU(GPGPU)的通用编程已经变得非常流行。但是,GPU体系结构和编程模型为编译器优化带来了许多新的挑战和机遇。这样的经典优化之一就是循环展开。当前的GPU编译器执行有限的循环展开。在本文中,我们试图了解循环展开对GPGPU程序的影响。我们开发了一种半自动的编译时方法,用于为GPGPU程序中的合适循环识别最佳展开因子。另外,我们基于被编译程序和被编译设备的特性,提出了减少被评估的展开因子数量的技术。我们使用这些技术来评估循环展开对一系列GPGPU程序的影响,并表明我们正确地确定了最佳展开因子。优化版本的运行速度比未优化版本快70%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号