【24h】

Optimal loop unrolling for GPGPU programs

机译:GPGPU程序展开的最佳循环

获取原文

摘要

Graphics Processing Units (GPUs) are massively parallel, many-core processors with tremendous computational power and very high memory bandwidth. With the advent of general purpose programming models such as NVIDIA's CUDA and the new standard OpenCL, general purpose programming using GPUs (GPGPU) has become very popular. However, the GPU architecture and programming model have brought along with it many new challenges and opportunities for compiler optimizations. One such classical optimization is loop unrolling. Current GPU compilers perform limited loop unrolling. In this paper, we attempt to understand the impact of loop unrolling on GPGPU programs. We develop a semi-automatic, compile-time approach for identifying optimal unroll factors for suitable loops in GPGPU programs. In addition, we propose techniques for reducing the number of unroll factors evaluated, based on the characteristics of the program being compiled and the device being compiled to. We use these techniques to evaluate the effect of loop unrolling on a range of GPGPU programs and show that we correctly identify the optimal unroll factors. The optimized versions run up to 70 percent faster than the unoptimized versions.
机译:图形处理单元(GPU)是具有巨大的计算电源和非常高的内存带宽的许多核心处理器。随着通用编程模型的出现,如NVIDIA的CUDA和新标准OPENCL,使用GPU(GPGPU)的通用编程已经变得非常受欢迎。但是,GPU架构和编程模型已经带来了许多新的挑战和编译器优化的机会。一个这样的经典优化是循环展开。当前GPU编译器执行有限的循环展开。在本文中,我们试图了解展开循环对GPGPU程序的影响。我们开发了一个半自动,编译时方法,用于识别GPGPU程序中合适的循环的最佳展开因素。此外,我们提出了根据编译程序的特征和编译的设备评估的展开因子数量的技术。我们使用这些技术来评估展开在一系列GPGPU程序上的循环效果,并显示我们正确识别最佳展开因素。优化的版本比未优化的版本速度快70%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号