Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

Akihiro Hayashi; Jun Shirako; Etorre Tiotto; Robert Ho; Vivek Sarkar

首页> 外文期刊>International Journal of High Performance Computing and Networking >Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

【24h】

Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

机译：OpenMP对GPU目标构建的绩效评估 - 探索编译优化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP's high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimisations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimising high-level programs for GPU execution, in this paper, we: 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100); 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.

机译：OpenMP是一种基于指令的共享内存并行编程模型，已广泛使用多年。从OpenMP 4.0开始，通过将OpenMP的高级并行抽象扩展到加速器编程来支持GPU平台。此扩展允许程序员以标准的C / C ++或Fortran语言编写GPU程序，而不会揭示GPU架构的太多细节。但是，这种高级编程模型通常对编译器和运行时系统施加额外的程序优化。否则，OpenMP程序可能比完全手工调整甚至天真的实现慢，具有像CUDA这样的低级编程模型。通过编制和优化GPU执行的高级计划来研究潜在的性能改进，本文：1）评估两种NVIDIA Tesla GPU（K80和P100）的OpenMP基准; 2）通过IBM XL和CLANG / LLVM编译器自动生成GPU程序在手写的CUDA和自动生成的GPU程序进行比较。

著录项

来源
《International Journal of High Performance Computing and Networking》 |2019年第1期|共16页
作者
Akihiro Hayashi; Jun Shirako; Etorre Tiotto; Robert Ho; Vivek Sarkar;
展开▼
作者单位

Department of Computer Science Rice University Houston TX USA;

Department of Computer Science Rice University Houston TX USA;

IBM Canada Laboratory;

IBM Canada Laboratory;

Department of Computer Science Rice University Houston TX USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类理论、方法;
关键词
GPUs; OpenMP; CUDA; LLVM; XL compiler; NVPTX; NVVM; Kepler; Pascal; performance evaluation; compilers; OpenMP's target constructs;

机译：GPU;Openmp;CUDA;LLVM;XL编译器;NVPTX;NVVM;拍绒;帕斯卡;绩效评估;编译器;OpenMP的目标构造;

相似文献

外文文献
中文文献
专利

1. Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations [J] . Akihiro Hayashi, Jun Shirako, Etorre Tiotto, International Journal of High Performance Computing and Networking . 2019,第1期

机译：OpenMP对GPU目标构建的绩效评估 - 探索编译优化
2. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization [J] . Lee S, Min SJ, Eigenmann R ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2009,第4期

机译：从OpenMP到GPGPU：用于自动翻译和优化的编译器框架
3. Performance of a Code Migration for the Simulation of Supersonic Ejector Flow to SMP, MIC, and GPU Using OpenMP, OpenMP+LEO, and OpenACC Directives [J] . C.Couder-Casta?eda, H.Barrios-Pi?a, I.Gitler, Scientific programming . 2015,第4期

机译：使用OpenMP，OpenMP + LEO和OpenACC指令模拟超音速喷射器流向SMP，MIC和GPU的代码迁移性能
4. Performance Assessment of OpenMP Compilers Targeting NVIDIA V100 GPUs [C] . Joshua Hoke Davis, Christopher Daley, Swaroop Pophale, International Workshop on Accelerator Programming Using Directives . 2020

机译：针对NVIDIA V100 GPU的OpenMP编译器的性能评估
5. Exploring High Performance Deep Neural Networks on GPUs [D] . Dong, Shi. 2020

机译：探索GPU的高性能深神经网络
6. Development and performance evaluation of the Medicines Optimisation Assessment Tool (MOAT): a prognostic model to target hospital pharmacists’ input to prevent medication-related problems [O] . Cathy Geeson, Li Wei, Bryony Dean Franklin -1

机译：药物优化评估工具（MOAT）的开发和性能评估：一种针对医院药剂师的意见以预防与药物相关的问题的预测模型
7. Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations [O] . Akihiro Hayashi, Jun Shirako, Etorre Tiotto, 2019

机译：OpenMP对GPU目标构建的绩效评估 - 探索编译优化
8. Computational Performance of Intel MIC, Sandy Bridge, and GPU Architectures: Implementation of a 1D c++/OpenMP Electrostatic Particle-In-Cell Code [R] . Lapenta, G, Vapirev, A, Deca, J, 2014

机译：英特尔mIC，sandy Bridge和GpU架构的计算性能：1D c ++ / Openmp静电粒子在线代码的实现

Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

摘要

著录项

相似文献

相关主题

期刊订阅