Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

机译：自动将C ++表达式模板卸载到支持CUDA的GPU

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the last few years, many scientific applications have been developed for powerful graphics processing units (GPUs) and have achieved remarkable speedups. This success can be partially attributed to high performance host callable GPU library routines that are offloaded to GPUs at runtime. These library routines are based on C/C++-like programming toolkits such as CUDA from NVIDIA and have the same calling signatures as their CPU counterparts. Recently, with the sufficient support of C++ templates from CUDA, the emergence of template libraries have enabled further advancement in code reusability and rapid software development for GPUs. However, Expression Templates (ET), which have been very popular for implementing data parallel scientific software for host CPUs because of their intuitive and mathematics-like syntax, have been underutilized by GPU development libraries. The lack of ET usage is caused by the difficulty of offloading expression templates from hosts to GPUs due to the inability to pass instantiated expressions to GPU kernels as well as the absence of the exact form of the expressions for the templates at the time of coding. This paper presents a general approach that enables automatic offloading of C++ expression templates to CUDA enabled GPUs by using the C++ metaprogramming technique and Just-In-Time (JIT) compilation methodology to generate and compile CUDA kernels for corresponding expression templates followed by executing the kernels with appropriate arguments. This approach allows developers to port applications to run on GPUs with virtually no code modifications. More specifically, this paper uses a large ET based data parallel physics library called QDP++ as an example to illustrate many aspects of the approach to offload expression templates automatically and to demonstrate very good speedups for typical QDP++ applications running on GPUs against running on CPUs using this method of offloading. In addition, this approach of automatic offlo- ding expression templates could be applied to other many-core accelerators that provide C++ programming toolkits with the support of C++ template.

机译：在过去的几年中，已经为强大的图形处理单元（GPU）开发了许多科学应用程序，并实现了显着的加速。这种成功可以部分归因于高性能主机可调用GPU库例程，这些例程在运行时已分流到GPU。这些库例程基于类似于C / C ++的编程工具包，例如NVIDIA的CUDA，并具有与CPU对应的调用签名。最近，在CUDA提供的对C ++模板的充分支持下，模板库的出现使代码可重用性和GPU的快速软件开发得到了进一步的发展。但是，由于开发模板的直观性和类似于数学的语法，因此在为主机CPU实现数据并行科学软件方面非常流行，而表达式模板（ET）却未得到充分利用。由于无法将实例化的表达式传递到GPU内核，以及在编码时缺少用于模板的表达式的确切形式，因此难以将ET模板从主机上卸载到GPU，从而导致ET使用不足。本文介绍了一种通用方法，该方法通过使用C ++元编程技术和即时（JIT）编译方法为相应的表达式模板生成和编译CUDA内核，然后执行内核，从而能够将C ++表达式模板自动卸载到支持CUDA的GPU。有适当的论据。这种方法允许开发人员移植应用程序而无需修改代码即可在GPU上运行。更具体地说，本文以一个名为QDP ++的基于ET的大型数据并行物理库为例，说明了自动卸载表达式模板的方法的许多方面，并演示了在GPU上运行的典型QDP ++应用程序相对于在CPU上运行的QDP ++应用程序的加速效果。卸载方法。另外，这种自动偏移表达式模板的方法可以应用于其他许多核心加速器，这些加速器为C ++编程工具包提供了C ++模板的支持。

著录项

来源
《2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops amp; PhD Forum》|2012年|p.2359- 2368|共10页
会议地点 Shanghai(CN)
作者
Chen Jie; Joo Balint; Watson III William; Edwards Robert;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using multiple GPUs with CUDA and MPI [J] . Dawei Mu, Po Chen, Liqiang Wang 地震学报（英文版） . 2013,第006期
2. CUDA Expression Templates for Electromagnetic Applications on GPUs [EM Programmer's Notebook] [J] . IEEE Antennas & Propagation Magazine . 2013,第5期

机译：用于GPU上的电磁应用的CUDA表达模板[EM程序员的笔记本]
3. Fast Reverse-Mode Automatic Differentiation using Expression Templates in C++ [J] . ROBIN J. HOGAN ACM transactions on mathematical software . 2014,第4期

机译：使用C ++中的表达模板的快速反向模式自动区分
4. Automatic differentiation in C++ using expression templates and application to a flow control problem [J] . Pierre Aubert, Nicolas Di Cesare, Olivier Pironneau Computing and visualization in science . 2001,第4期

机译：使用表达式模板在C ++中自动区分并应用于流控制问题
5. Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs [C] . Jie Chen, Balint Joo, William Watson III, IEEE International Parallel and Distributed Processing Symposium . 2012

机译：将C ++表达模板自动卸载到CUDA启用的GPU
6. Cryptographic algorithm acceleration using CUDA enabled GPUs in typical system configurations. [D] . Bobrov, Maksim. 2010

机译：在典型的系统配置中，使用启用了CUDA的GPU进行加密算法加速。
7. Accelerating metagenomic read classification on CUDA-enabled GPUs [O] . Robin Kobus, Christian Hundt, André Müller, 2017

机译：在支持CUDA的GPU上加速宏基因组读取分类
8. Accelerating metagenomic read classification on CUDA-enabled GPUs [O] . Robin Kobus, Christian Hundt, André Müller, 2017

机译：在支持CUDA的GPU上加速宏基因组读取分类

Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs

摘要

著录项

相似文献

相关主题

期刊订阅