首页> 外文期刊>International Journal of High Performance Computing and Networking >Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations
【24h】

Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations

机译:OpenMP对GPU目标构建的绩效评估 - 探索编译优化

获取原文
获取原文并翻译 | 示例
           

摘要

OpenMP is a directive-based shared memory parallel programming model and has been widely used for many years. From OpenMP 4.0 onwards, GPU platforms are supported by extending OpenMP's high-level parallel abstractions with accelerator programming. This extension allows programmers to write GPU programs in standard C/C++ or Fortran languages, without exposing too many details of GPU architectures. However, such high-level programming models generally impose additional program optimisations on compilers and runtime systems. Otherwise, OpenMP programs could be slower than fully hand-tuned and even naive implementations with low-level programming models like CUDA. To study potential performance improvements by compiling and optimising high-level programs for GPU execution, in this paper, we: 1) evaluate a set of OpenMP benchmarks on two NVIDIA Tesla GPUs (K80 and P100); 2) conduct a comparable performance analysis among hand-written CUDA and automatically-generated GPU programs by the IBM XL and clang/LLVM compilers.
机译:OpenMP是一种基于指令的共享内存并行编程模型,已广泛使用多年。从OpenMP 4.0开始,通过将OpenMP的高级并行抽象扩展到加速器编程来支持GPU平台。此扩展允许程序员以标准的C / C ++或Fortran语言编写GPU程序,而不会揭示GPU架构的太多细节。但是,这种高级编程模型通常对编译器和运行时系统施加额外的程序优化。否则,OpenMP程序可能比完全手工调整甚至天真的实现慢,具有像CUDA这样的低级编程模型。通过编制和优化GPU执行的高级计划来研究潜在的性能改进,本文:1)评估两种NVIDIA Tesla GPU(K80和P100)的OpenMP基准; 2)通过IBM XL和CLANG / LLVM编译器自动生成GPU程序在手写的CUDA和自动生成的GPU程序进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号