首页> 外文期刊>Computing >Performance of CPU/GPU compiler directives on ISO/TTI kernels

Performance of CPU/GPU compiler directives on ISO/TTI kernels

机译:ISO / TTI内核上CPU / GPU编译器指令的性能

获取原文并翻译 | 示例


GPUs are slowly becoming ubiquitous devices in High Performance Computing, as their capabilities to enhance the performance per watt of compute intensive algorithms as compared to multicore CPUs have been identified. The primary shortcoming of a GPU is usability, since vendor specific APIs are quite different from existing programming languages, and it requires a substantial knowledge of the device and programming interface to optimize applications. Hence, lately a growing number of higher level programming models are targeting GPUs to alleviate this problem. The ultimate goal for a high-level model is to expose an easy-to-use interface for the user to offload compute intensive portions of code (kernels) to the GPU, and tune the code according to the target accelerator to maximize overall performance with a reduced development effort. In this paper, we share our experiences of three of the notable high-level directive based GPU programming models-PGI, CAPS and OpenACC (from CAPS and PGI) on an Nvidia M2090 GPU. We analyze their performance and programmability against Isotropic (ISO)/Tilted Transversely Isotropic (TTI) finite difference kernels, which are primary components in the Reverse Time Migration (RTM) application used by oil and gas exploration for seismic imaging of the sub-surface. When ported to a single GPU using the mentioned directives, we observe an average 1.5-1.8x improvement in performance for both ISO and TTI kernels, when compared with optimized multi-threaded CPU implementations using OpenMP.
机译:GPU已逐渐成为高性能计算中的无处不在的设备,因为与多核CPU相比,GPU具有提高每瓦计算密集型算法性能的能力。 GPU的主要缺点是可用性,因为特定于供应商的API与现有的编程语言完全不同,并且需要对设备和编程接口有充分的了解才能优化应用程序。因此,近来越来越多的高级编程模型以GPU为目标来缓解这一问题。高级模型的最终目标是向用户提供易于使用的界面,以将计算密集型代码(内核)部分卸载到GPU,并根据目标加速器对代码进行调整,以最大程度地提高整体性能。减少开发工作量。在本文中,我们分享了在Nvidia M2090 GPU上三种基于高级指令的著名GPU编程模型-PGI,CAPS和OpenACC(来自CAPS和PGI)的经验。我们分析了各向同性(ISO)/倾斜横向各向同性(TTI)有限差分内核的性能和可编程性,这些内核是石油和天然气勘探用于地下地震成像的反向时间偏移(RTM)应用程序的主要组成部分。当使用上述指令将其移植到单个GPU上时,与使用OpenMP的优化多线程CPU实现相比,我们观察到ISO和TTI内核的性能平均提高1.5-1.8倍。



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号