Optimal loop unrolling for GPGPU programs

机译：GPGPU程序的最佳循环展开

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Units (GPUs) are massively parallel, many-core processors with tremendous computational power and very high memory bandwidth. With the advent of general purpose programming models such as NVIDIA's CUDA and the new standard OpenCL, general purpose programming using GPUs (GPGPU) has become very popular. However, the GPU architecture and programming model have brought along with it many new challenges and opportunities for compiler optimizations. One such classical optimization is loop unrolling. Current GPU compilers perform limited loop unrolling. In this paper, we attempt to understand the impact of loop unrolling on GPGPU programs. We develop a semi-automatic, compile-time approach for identifying optimal unroll factors for suitable loops in GPGPU programs. In addition, we propose techniques for reducing the number of unroll factors evaluated, based on the characteristics of the program being compiled and the device being compiled to. We use these techniques to evaluate the effect of loop unrolling on a range of GPGPU programs and show that we correctly identify the optimal unroll factors. The optimized versions run up to 70 percent faster than the unoptimized versions.

机译：图形处理单元（GPU）是大规模并行的多核处理器，具有强大的计算能力和很高的内存带宽。随着诸如NVIDIA CUDA和新标准OpenCL之类的通用编程模型的出现，使用GPU（GPGPU）的通用编程已经变得非常流行。但是，GPU体系结构和编程模型为编译器优化带来了许多新的挑战和机遇。这样的经典优化之一就是循环展开。当前的GPU编译器执行有限的循环展开。在本文中，我们试图了解循环展开对GPGPU程序的影响。我们开发了一种半自动的编译时方法，用于为GPGPU程序中的合适循环识别最佳展开因子。另外，我们基于被编译程序和被编译设备的特性，提出了减少被评估的展开因子数量的技术。我们使用这些技术来评估循环展开对一系列GPGPU程序的影响，并表明我们正确地确定了最佳展开因子。优化版本的运行速度比未优化版本快70％。

著录项

来源
《2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS)》|2010年|P.1-11|共11页
会议地点 Atlanta GA(US);Atlanta GA(US)
作者
Murthy Giridhar Sreenivasa; Ravishankar Mahesh; Baskaran Muthu Manikandan; Sadayappan P.;
展开▼
作者单位

Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.133;
关键词
Compiler optimizations; GPGPU; Loop Unrolling;

机译：编译器优化; GPGPU;循环展开;

相似文献

外文文献
中文文献
专利

1. Loop Unrolling for Energy Efficiency in Low-Cost Field-Programmable Gate Arrays [J] . Dumpala Naveen Kumar, Patil Shivukumar B., Holcomb Daniel, ACM transactions on reconfigurable technology and systems . 2018,第4期

机译：低成本的现场可编程门阵列中的环路展开以提高能效
2. TL-HLS: Methodology for Low Cost Hardware Trojan Security Aware Scheduling With Optimal Loop Unrolling Factor During High Level Synthesis [J] . Anirban Sengupta, Saumya Bhadauria, Saraju P. Mohanty IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2017,第4期

机译：TL-HLS：在高级综合过程中具有最佳循环展开因子的低成本硬件木马安全感知调度方法
3. Optimal Loop Unrolling and Shifting for Reconfigurable Architectures [J] . OZANA SILVIA DRAGOMIR, TODOR STEFANOV, KOEN BERTELS ACM transactions on reconfigurable technology and systems . 2009,第4期

机译：可重构架构的最佳循环展开和移位
4. Optimal loop unrolling for GPGPU programs [C] . Murthy G.S., Ravishankar M., Baskaran M.M., 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：GPGPU程序的最佳循环展开
5. Program speedup through generalized loop unrolling. [D] . Leng, Tau. 2001

机译：通过广义循环展开提高程序速度。
6. Second-order optimality conditions for nonlinear programs and mathematical programs [O] . Ikram Daidai -1

机译：非线性程序和数学程序的二阶最优性条件
7. Loop Unrolling for Energy Efficiency in Low-Cost Field-Programmable Gate Arrays [O] . Naveen Kumar Dumpala, Shivukumar B. Patil, Daniel Holcomb, 2019

机译：在低成本现场可编程门阵列中展开能效的循环
8. Effects of Loop Unrolling and Loop Fusion on Register Pressure and CodePerformance [R] . Shires, D. 1997

机译：环路展开和环路融合对套准压力和代码性能的影响

Optimal loop unrolling for GPGPU programs

摘要

著录项

相似文献

相关主题

期刊订阅