首页> 外文OA文献 >Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU
【2h】

Transformations de programme automatiques et source-à-source pour accélérateurs matériels de type GPU

机译:GPU类型的硬件加速器的自动和源到源程序转换

摘要

Since the beginning of the 2000s, the raw performance of processors stopped its exponential increase. The modern graphic processing units (GPUs) have been designed as array of hundreds or thousands of compute units. The GPUs' compute capacity quickly leads them to be diverted from their original target to be used as accelerators for general purpose computation. However programming a GPU efficiently to perform other computations than 3D rendering remains challenging.The current jungle in the hardware ecosystem is mirrored by the software world, with more and more programming models, new languages, different APIs, etc. But no one-fits-all solution has emerged.This thesis proposes a compiler-based solution to partially answer the three "P" properties: Performance, Portability, and Programmability. The goal is to transform automatically a sequential program into an equivalent program accelerated with a GPU. A prototype, Par4All, is implemented and validated with numerous experiences. The programmability and portability are enforced by definition, and the performance may not be as good as what can be obtained by an expert programmer, but still has been measured excellent for a wide range of kernels and applications.A survey of the GPU architectures and the trends in the languages and framework design is presented. The data movement between the host and the accelerator is managed without involving the developer. An algorithm is proposed to optimize the communication by sending data to the GPU as early as possible and keeping them on the GPU as long as they are not required by the host. Loop transformations techniques for kernel code generation are involved, and even well-known ones have to be adapted to match specific GPU constraints. They are combined in a coherent and flexible way and dynamically scheduled within the compilation process of an interprocedural compiler. Some preliminary work is presented about the extension of the approach toward multiple GPUs.
机译:自2000年代初以来,处理器的原始性能就停止了其指数级增长。现代图形处理单元(GPU)已设计为成百上千个计算单元的阵列。 GPU的计算能力迅速导致它们偏离其原始目标,用作通用计算的加速器。但是,有效地对GPU进行编程以执行除3D渲染以外的其他计算仍具有挑战性。硬件生态系统中的当前丛林已被软件世界所反映,具有越来越多的编程模型,新语言,不同的API等。本文提出了一种基于编译器的解决方案,部分解决了三个“ P”属性:性能,可移植性和可编程性。目标是将顺序程序自动转换为使用GPU加速的等效程序。 Par4All原型的实施和验证具有许多经验。可编程性和可移植性由定义来强制执行,其性能可能不如专业程序员所能获得的那样好,但是在各种内核和应用程序中仍被认为具有出色的性能。介绍了语言和框架设计的趋势。主机和加速器之间的数据移动是在不涉及开发人员的情况下进行管理的。提出了一种算法,可通过尽早将数据发送到GPU并在主机不需要的时候将它们保留在GPU上来优化通信。涉及用于内核代码生成的循环转换技术,即使是众所周知的循环转换技术也必须进行调整以匹配特定的GPU约束。它们以一致且灵活的方式组合在一起,并在过程间编译器的编译过程中动态调度。提出了一些有关将该方法扩展到多个GPU的初步工作。

著录项

  • 作者

    Amini Mehdi;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号