首页> 外文会议>ACM SIGPLAN workshop on declarative aspects of multicore programming 2011 >Breaking the GPU Programming Barrier with the Auto-Parallelising SAC Compiler
【24h】

Breaking the GPU Programming Barrier with the Auto-Parallelising SAC Compiler

机译:使用自动并行SAC编译器突破GPU编程障碍

获取原文
获取原文并翻译 | 示例

摘要

Over recent years, the use of Graphics Processing Units (GPUs) for general-purpose computing has become increasingly popular. The main reasons for this development are the attractive performance/price and performance/power ratios of these architectures. However, substantial performance gains from GPUs come at a price: they require extensive programming expertise and, typically, a substantial re-coding effort. Although the programming experience has been significantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices. Directive-based approaches such as hiCUDA or OPENMP-variants offer further improvements but have not eliminated the need for the expertise on these complex architectures. Similarly, special purpose programming languages such as Microsoft's Accelerator try to lower the barrier further. They provide the programmer with a special form of GPU data structures and operations on them which are then compiled into GPU code. In this paper, we take this trend towards a completely implicit, high-level approach yet another step further. We generate CUDA code from a MATLAB-like high-level functional array programming language, Single Assignment C (SaC). To do so, we identify which data structures and operations can be successfully mapped on GPUs and transform existing programs accordingly. This paper presents the first runtime results from our GPU backend and it presents the basic set of GPU-specific program optimisations that turned out to be essential. Despite our high-level program specifications, we show that for a number of benchmarks speedups between a factor of 5 and 50 can be achieved through our parallelising com-piler.
机译:近年来,将图形处理单元(GPU)用于通用计算已变得越来越流行。进行这种开发的主要原因是这些架构具有吸引力的性能/价格和性能/功率比。但是,GPU的大量性能提升是有代价的:它们需要大量的编程专业知识,并且通常需要大量的重新编码工作。尽管诸如CUDA和OpenCL之类的现有框架已大大改善了编程经验,但是有效利用这些设备仍然是一个挑战。基于指令的方法(例如hiCUDA或OPENMP变体)提供了进一步的改进,但并未消除对这些复杂体系结构的专业知识的需求。同样,诸如Microsoft的Accelerator之类的专用编程语言试图进一步降低障碍。它们为程序员提供了一种特殊形式的GPU数据结构和对其进行操作,然后将其编译为GPU代码。在本文中,我们将这种趋势向完全隐含的高级方法发展了又一步。我们从类似MATLAB的高级功能数组编程语言,单一分配C(SaC)生成CUDA代码。为此,我们确定可以在GPU上成功映射哪些数据结构和操作,并相应地转换现有程序。本文介绍了我们的GPU后端的第一个运行时结果,并介绍了基本的特定于GPU的程序优化集,事实证明这是至关重要的。尽管我们有高级程序规范,但我们表明,对于许多基准测试,通过并行编译器可以实现5到50倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号