Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

机译：GPU上非结构化网格算法的并行化方法，语言和编译器的比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Efficiently exploiting GPUs is increasingly essential in scientific computing, as many current and upcoming supercomputers are built using them. To facilitate this, there are a number of programming approaches, such as CUDA, OpenACC and OpenMP 4, supporting different programming languages (mainly C/CH++ and Fortran). There are also several compiler suites (clang, nvcc, PGI, XL) each supporting different combinations of languages. In this study, we take a detailed look at some of the currently available options, and carry out a comprehensive analysis and comparison using computational loops and applications from the domain of unstructured mesh computations. Beyond runtimes and performance metrics (GB/s), we explore factors that influence performance such as register counts, occupancy, usage of different memory types, instruction counts, and algorithmic differences. Results of this work show how clang's CUDA compiler frequently outperform NVIDIA's nvcc, performance issues with directive-based approaches on complex kernels, and OpenMP 4 support maturing in clang and XL; currently around 10% slower than CUDA.

机译：在科学计算中，有效利用GPU变得越来越重要，因为许多当前和即将推出的超级计算机都是使用它们来构建的。为此，有许多编程方法，例如CUDA，OpenACC和OpenMP 4，支持不同的编程语言（主要是C / CH ++和Fortran）。还有几个编译器套件（clang，nvcc，PGI，XL），每个套件都支持不同的语言组合。在这项研究中，我们详细研究了一些当前可用的选项，并使用来自非结构化网格计算领域的计算循环和应用程序进行了全面的分析和比较。除了运行时和性能指标（GB / s），我们还将探讨影响性能的因素，例如寄存器计数，占用率，不同内存类型的使用情况，指令计数和算法差异。这项工作的结果表明clang的CUDA编译器如何经常胜过NVIDIA的nvcc，复杂内核上基于指令的方法的性能问题以及OpenMP 4在clang和XL中的支持日趋成熟。目前比CUDA慢10％。

著录项

来源
《International workshop on performance modeling, benchmarking, and simulation of high-performance computing systems;ACM/IEEE international conference for high-performance computing, networking, storage and analysis》|2018年|22-43|共22页
会议地点
作者
G. D. Balogh; I. Z. Reguly; G. R. Mudalige;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Compilers; CUDA; OpenACC; OpenMP; GPU Benchmarking;

机译：编译器; CUDA; OpenACC; OpenMP; GPU基准测试;

相似文献

外文文献
中文文献
专利

1. Locality optimized unstructured mesh algorithms on GPUs [J] . Sulyok Andras Attila, Balogh Gabor Daniel, Reguly Istvan Z., Journal of Parallel and Distributed Computing . 2019,第Deca期

机译：在GPU上进行局部性优化的非结构化网格算法
2. Two-way Embedding Algorithms: A Reviewsubmitted To Ocean Dynamics: Special Issue On Multi-scale Modelling: Nested Grid And Unstructured Mesh Approaches [J] . Laurent Debreu, Eric Blayo Ocean Dynamics . 2008,第5a6期

机译：双向嵌入算法：海洋动力学的回顾：多尺度建模的特殊问题：嵌套网格和非结构化网格方法
3. A comparison of GPU strategies for unstructured mesh physics [J] . Charles R. Ferenbaugh Concurrency and Computation . 2013,第11期

机译：非结构化网格物理的GPU策略比较
4. Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs [C] . G. D. Balogh, I. Z. Reguly, G. R. Mudalige International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems . 2018

机译：GPU上非结构化网格算法的平行方法，语言和编译器的比较
5. Comparison of two methods for two dimensional unstructured mesh adaptation with elliptic smoothing. [D] . O'Connell, Matthew David. 2011

机译：二维非结构化网格自适应与椭圆平滑的两种方法的比较。
6. Accelerated cryo-EM structure determination with parallelisation using GPUs in RELION-2 [O] . Dari Kimanius, Björn O Forsberg, Sjors HW Scheres, 2016

机译：使用RELION-2中的GPU通过并行化加速低温电磁结构确定
7. Comparison of parallelisation approaches, languages, and compilers for unstructured mesh algorithms on GPUs [O] . Balogh G. D., Reguly Istvan Z., Mudalige Gihan R. 2017

机译：比较GPU上非结构化网格算法的并行化方法，语言和编译器

Comparison of Parallelisation Approaches, Languages, and Compilers for Unstructured Mesh Algorithms on GPUs

摘要

著录项

相似文献

相关主题

期刊订阅