首页> 外文学位 >Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms.

【24h】

Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms.

机译：用于将算法和应用程序映射到CUDA GPU平台和CPU-GPU异构平台的优化技术。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

An emerging trend in processor architecture seems to indicate the doubling of the number of cores per chip every two years with same or decreased clock speed. Of particular interest to this thesis is the class of many-core processors, which are becoming more attractive due to their high performance, low cost, and low power consumption. The main goal of this dissertation is to develop optimization techniques for mapping algorithms and applications onto CUDA GPUs and CPU-GPU heterogeneous platforms.;The Fast Fourier transform (FFT) constitutes a fundamental tool in computational science and engineering, and hence a GPU-optimized implementation is of paramount importance. We first study the mapping of the 3D FFT onto the recent, CUDA GPUs and develop a new approach that minimizes the number of global memory accesses and overlaps the computations along the different dimensions. We obtain some of the fastest known implementations for the computation of multi-dimensional FFT.;We then present a highly multithreaded FFT-based direct Poisson solver that is optimized for the recent NVIDIA GPUs. In addition to the massive multithreading, our algorithm carefully manages the multiple layers of the memory hierarchy so that all global memory accesses are coalesced into 128-bytes device memory transactions. As a result, we have achieved up to 375GFLOPS with a bandwidth of 120GB/s on the GTX 480.;We further extend our methodology to deal with CPU-GPU based heterogeneous platforms for the case when the input is too large to fit on the GPU global memory. We develop optimization techniques for memory-bound, and computation-bound application. The main challenge here is to minimize data transfer between the CPU memory and the device memory and to overlap as much as possible these transfers with kernel execution. For memory-bounded applications, we achieve a near-peak effective PCIe bus bandwidth, 9-10GB/s and performance as high as 145 GFLOPS for multi-dimensional FFT computations and for solving the Poisson equation. We extend our CPU-GPU based software pipeline to a computation-bound application-DGEMM, and achieve the illusion of a memory of the CPU memory size and a computation throughput similar to a pure GPU.

机译：处理器体系结构的一种新兴趋势似乎表明，在时钟速度相同或降低的情况下，每芯片内核数每两年增加一倍。本论文特别感兴趣的是多核处理器，由于其高性能，低成本和低功耗而变得越来越有吸引力。本文的主要目的是开发用于将算法和应用程序映射到CUDA GPU和CPU-GPU异构平台上的优化技术。快速傅里叶变换（FFT）构成了计算科学和工程学的基本工具，因此对GPU进行了优化。实施至关重要。我们首先研究了3D FFT在最新的CUDA GPU上的映射，并开发了一种新的方法，该方法可以最大程度地减少全局内存访问的数量，并使沿不同维度的计算重叠。我们获得了一些最快的用于多维FFT计算的已知实现。然后，我们提出了一种基于FFT的高度多线程的直接泊松求解器，该求解器针对最近的NVIDIA GPU进行了优化。除了大量的多线程外，我们的算法还仔细管理内存层次结构的多层，以便将所有全局内存访问合并为128字节的设备内存事务。结果，我们在GTX 480上实现了高达375GFLOPS的带宽和120GB / s的带宽。我们进一步扩展了方法，以应对输入量太大而无法容纳基于CPU-GPU的异构平台的情况。 GPU全局内存。我们为内存绑定和计算绑定应用程序开发优化技术。这里的主要挑战是最大程度地减少CPU内存和设备内存之间的数据传输，并使这些传输与内核执行尽可能重叠。对于内存有限的应用，我们实现了接近峰值的有效PCIe总线带宽，9-10GB / s，以及高达145 GFLOPS的性能，可用于多维FFT计算和求解泊松方程。我们将基于CPU-GPU的软件管道扩展到计算绑定的应用程序-DGEMM，并实现了与纯GPU相似的CPU内存大小和计算吞吐量的内存错觉。

著录项

作者
Wu, Jing.;
展开▼
作者单位

University of Maryland, College Park.;

展开▼
授予单位 University of Maryland, College Park.;
学科 Engineering Computer.;Computer Science.
学位 Ph.D.
年度 2014
页码 179 p.
总页数 179
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. On the Performance and Energy Consumption of Molecular Dynamics Applications for Heterogeneous CPU-GPU Platforms Based on Gromacs [J] . A. Poghosyan, H. Astsatryan, W. Narsisian, Cybernetics and information technologies: CIT . 2017,第5期

机译：基于Gromacs的异构CPU-GPU平台分子动力学应用程序的性能和能耗
2. Analysis of energy efficiency of a parallel AES algorithm for CPU-GPU heterogeneous platforms [J] . Fei Xiongwei, Li Kenli, Yang Wangdong, Parallel Computing . 2020,第Juna期

机译：CPU-GPU异构平台的平行AES算法能效分析
3. Efficient adaptive load balancing approach for compressive background subtraction algorithm on heterogeneous CPU-GPU platforms [J] . Mabrouk Lhoussein, Huet Sylvain, Houzet Dominique, Journal of Real-Time Image Processing . 2020,第5期

机译：异构CPU-GPU平台压缩背景减法算法的高效自适应负载平衡方法
4. Evaluation of NDVI and NDWI parameters in CPU-GPU Heterogeneous Platforms based CUDA [C] . Fatima Zahra GUERROUJ, Rachid LATIF, Amine SADDIK International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications . 2020

机译：基于CUDA的CPU-GPU异构平台的NDVI和NDWI参数评估
5. Performance analysis and acceleration of nuclear physics application on high-performance computing platforms using GPGPUs and topology-aware mapping techniques [D] . Oryspayev, Dossay. 2016

机译：使用GPGPU和拓扑信息映射技术对高性能计算平台核物理应用的性能分析与加速
6. CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications [O] . Guoqing Lei, Yong Dou, Wen Wan, 2012

机译：CPU-GPU混合加速RNA二级结构预测应用的Zuker算法
7. Sparse matrix partitioning for optimizing SpMV on CPU-GPU heterogeneous platforms [O] . Akrem Benatia, Weixing Ji, Yizhuo Wang, 2019

机译：用于优化CPU-GPU异构平台SPMV的稀疏矩阵分区

Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms.

摘要

著录项

相似文献

相关主题

期刊订阅