Breaking the GPU Programming Barrier with the Auto-Parallelising SAC Compiler

机译：使用自动并行SAC编译器突破GPU编程障碍

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over recent years, the use of Graphics Processing Units (GPUs) for general-purpose computing has become increasingly popular. The main reasons for this development are the attractive performance/price and performance/power ratios of these architectures. However, substantial performance gains from GPUs come at a price: they require extensive programming expertise and, typically, a substantial re-coding effort. Although the programming experience has been significantly improved by existing frameworks like CUDA and OpenCL, it is still a challenge to effectively utilise these devices. Directive-based approaches such as hiCUDA or OPENMP-variants offer further improvements but have not eliminated the need for the expertise on these complex architectures. Similarly, special purpose programming languages such as Microsoft's Accelerator try to lower the barrier further. They provide the programmer with a special form of GPU data structures and operations on them which are then compiled into GPU code. In this paper, we take this trend towards a completely implicit, high-level approach yet another step further. We generate CUDA code from a MATLAB-like high-level functional array programming language, Single Assignment C (SaC). To do so, we identify which data structures and operations can be successfully mapped on GPUs and transform existing programs accordingly. This paper presents the first runtime results from our GPU backend and it presents the basic set of GPU-specific program optimisations that turned out to be essential. Despite our high-level program specifications, we show that for a number of benchmarks speedups between a factor of 5 and 50 can be achieved through our parallelising com-piler.

机译：近年来，将图形处理单元（GPU）用于通用计算已变得越来越流行。进行这种开发的主要原因是这些架构具有吸引力的性能/价格和性能/功率比。但是，GPU的大量性能提升是有代价的：它们需要大量的编程专业知识，并且通常需要大量的重新编码工作。尽管诸如CUDA和OpenCL之类的现有框架已大大改善了编程经验，但是有效利用这些设备仍然是一个挑战。基于指令的方法（例如hiCUDA或OPENMP变体）提供了进一步的改进，但并未消除对这些复杂体系结构的专业知识的需求。同样，诸如Microsoft的Accelerator之类的专用编程语言试图进一步降低障碍。它们为程序员提供了一种特殊形式的GPU数据结构和对其进行操作，然后将其编译为GPU代码。在本文中，我们将这种趋势向完全隐含的高级方法发展了又一步。我们从类似MATLAB的高级功能数组编程语言，单一分配C（SaC）生成CUDA代码。为此，我们确定可以在GPU上成功映射哪些数据结构和操作，并相应地转换现有程序。本文介绍了我们的GPU后端的第一个运行时结果，并介绍了基本的特定于GPU的程序优化集，事实证明这是至关重要的。尽管我们有高级程序规范，但我们表明，对于许多基准测试，通过并行编译器可以实现5到50倍的加速。

著录项

来源
《ACM SIGPLAN workshop on declarative aspects of multicore programming 2011》|2011年|p.14-22|共9页
会议地点 Austin TX(US);Austin TX(US)
作者
Jing Guo; Jeyarajan Thiyagalingam; Sven-Bodo Scholz;
展开▼
作者单位

University of Hertfordshire,Hatfield, UK;

Oxford e-Research Centre, University of Oxford, Oxford, UK;

University of Hertfordshire, Hatfield, UK;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词
compiler; optimization; CUDA; GPU; code; genera-tion;

机译：编译器优化; CUDA; GPU;码;代;
入库时间 2022-08-26 14:03:35

相似文献

外文文献
中文文献
专利

1. An optimizing compiler for GPGPU programs with input-data sharing [J] . Yang Yi, Xiang Ping, Kong Jingfei, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2010,第5期

机译：具有输入数据共享功能的GPGPU程序的优化编译器
2. Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins [J] . Zhu Yan, Zimmerman Zachary, Senobari Nader Shakibay, Knowledge and information systems . 2018,第1期

机译：利用小说算法和GPU来打破10千万弧度成对比较的时间序列主题和加入
3. 'Breaking Barriers, Breaking Bread': Pilot study to evaluate acceptability of a school breakfast program utilising donated food [J] . Deavin Natika, McMahon Anne-Therese, Walton Karen, Nutrition & dietetics: the journal of the Dietitians Association of Australia . 2018,第5期

机译：“破坏障碍，破坏面包”：试点研究，以评估利用捐赠食物的学校早餐计划的可接受性
4. Breaking the GPU Programming Barrier with the Auto-Parallelising SAC Compiler [C] . Jing Guo, Jeyarajan Thiyagalingam, Sven-Bodo Scholz ACM SIGPLAN workshop on declarative aspects of multicore programming . 2011

机译：使用自动并行化SAC编译器打破GPU编程屏障
5. Enhancing GPU Programmability and Correctness Through Transactional Execution. [D] . Holey, Anup Purushottam. 2015

机译：通过事务执行增强GPU的可编程性和正确性。
6. Breaking barriers in the prevention of adolescent pregnancies for in-school children in Kirehe district (Rwanda): a mixed-method study for the development of a peer education program on sexual and reproductive health [O] . Aimable Nkurunziza, Nadja Van Endert, Justine Bagirisano, 2020

机译：在赫尔赫区（卢旺达）（卢旺达）（卢旺达）预防青少年妊娠的障碍：对性生殖健康同行教育计划的发展的混合方法研究
7. An optimizing compiler for GPGPU programs with input-data sharing [O] . Yi Yang, Ping Xiang, Jingfei Kong, 2010

机译：具有输入数据共享的GpGpU程序的优化编译器

Breaking the GPU Programming Barrier with the Auto-Parallelising SAC Compiler

摘要

著录项

相似文献

相关主题

期刊订阅