首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum >Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model
【24h】

Evaluating OpenMP 4.0's Effectiveness as a Heterogeneous Parallel Programming Model

机译:评估OpenMP 4.0作为异构并行编程模型的有效性

获取原文

摘要

Although the OpenMP 4.0 standard has been available since 2013, support for GPUs has been absent up until very recently, with only a handful of experimental compilers available. In this work we evaluate the performance of Cray's new NVIDIA GPU targeting implementation of OpenMP 4.0, with the mini-apps TeaLeaf, CloverLeaf and BUDE. We successfully port each of the applications, using a simple and consistent design throughout, and achieve performance on an NVIDIA K20X that is comparable to Cray's OpenACC in all cases. BUDE, a compute bound code, required 2.2x the runtime of an equivalently optimised CUDA code, which we believe is caused by an inflated frequency of control flow operations and less efficient arithmetic optimisation. Impressively, both TeaLeaf and CloverLeaf, memory bandwidth bound codes, only required 1.3x the runtime of hand-optimised CUDA implementations. Overall, we find that OpenMP 4.0 is a highly usable open standard capable of performant heterogeneous execution, making it a promising option for scientific application developers.
机译:尽管OpenMP 4.0标准自2013年以来就可用,但直到最近才一直没有对GPU的支持,只有少数实验性编译器可用。在这项工作中,我们使用微型应用程序TeaLeaf,CloverLeaf和BUDE评估了Cray的针对OpenMP 4.0的全新NVIDIA GPU目标实现的性能。我们始终使用简单一致的设计成功地移植了每个应用程序,并在所有情况下都可在CK的OpenACC上与NVIDIA K20X媲美。 BUDE(一种计算绑定代码)所需的运行时间是同等优化的CUDA代码的2.2倍,我们认为这是由于控制流操作频率过高和算术优化效率较低而引起的。令人印象深刻的是,TeaLeaf和CloverLeaf(内存带宽绑定代码)只需要手动优化CUDA实现的运行时间的1.3倍。总的来说,我们发现OpenMP 4.0是一个高度可用的开放标准,能够执行高性能的异构执行,这使其成为科学应用程序开发人员的有希望的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号