首页> 外文期刊>Journal of supercomputing >Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading
【24h】

Automatic translation of data parallel programs for heterogeneous parallelism through OpenMP offloading

机译:通过OpenMP卸载自动翻译异构并行性的数据并行程序

获取原文
获取原文并翻译 | 示例
           

摘要

Heterogeneous multicores like GPGPUs are now commonplace in modern computing systems. Although heterogeneous multicores offer the potential for high performance, programmers are struggling to program such systems. This paper presents OAO, a compiler-based approach to automatically translate shared-memory OpenMP data-parallel programs to run on heterogeneous multicores through OpenMP offloading directives. Given the large user base of shared memory OpenMP programs, our approach allows programmers to continue using a single-source-based programming language that they are familiar with while benefiting from the heterogeneous performance. OAO introduces a novel runtime optimization scheme to automatically eliminate unnecessary host-device communication to minimize the communication overhead between the host and the accelerator device. We evaluate OAO by applying it to 23 benchmarks from the PolyBench and Rodinia suites on two distinct GPU platforms. Experimental results show that OAO achieves up to 32x speedup over the original OpenMP version, and can reduce the host-device communication overhead by up to 99% over the hand-translated version.
机译:类似GPGPU的异质多设备现在在现代计算系统中普遍。虽然异构多设备提供高性能的潜力,但程序员正在努力为这些系统进行努力。本文介绍OAO,基于编译器的方法,可自动翻译共享内存OpenMP数据并行程序,通过OpenMP卸载指令在异构多设备上运行。鉴于共享内存OpenMP程序的大型用户群,我们的方法允许程序员继续使用他们熟悉的基于单源的编程语言,同时受益于异构性能。 OAO介绍了一种新的运行时优化方案,以自动消除不必要的主机设备通信,以最小化主机和加速器设备之间的通信开销。通过将其从两个不同的GPU平台应用于来自多封封和罗迪尼亚套件的23个基准来评估OAO。实验结果表明,OAO在原始OpenMP版本上实现了高达32倍的加速,并且可以通过手工翻译版本将主机设备通信开销减少高达99%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号