【24h】

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

机译:GPU上非结构化网格算法的并行化方法,语言和编译器的比较

获取原文

摘要

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/CH++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang's CUDA compiler frequently outperform NVIDIA's nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10% slower than CUDA.
机译:在科学计算中,有效利用GPU变得越来越重要,因为许多当前和即将推出的超级计算机都是使用它们来构建的。为此,有许多编程方法,例如CUDA,OpenACC和OpenMP 4,支持不同的编程语言(主要是C / CH ++和Fortran)。还有几个编译器套件(clang,nvcc,PGI,XL),每个套件都支持不同的语言组合。在这项研究中,我们详细研究了一些当前可用的选项,并使用来自非结构化网格计算领域的计算循环和应用程序进行了全面的分析和比较。除了运行时和性能指标(GB / s),我们还将探讨影响性能的因素,例如寄存器计数,占用率,不同内存类型的使用情况,指令计数和算法差异。这项工作的结果表明clang的CUDA编译器如何经常胜过NVIDIA的nvcc,复杂内核上基于指令的方法的性能问题以及OpenMP 4在clang和XL中的支持日趋成熟。目前比CUDA慢10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号