Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

机译：内核专业化，可提高图形处理单元（GPU）的适应性和性能

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Graphics processing units (GPUs) offer significant speedups over CPUs for certain classes of applications. However, programming for GPUs is challenging. There are many parameters that affect performance and their values may change depending on both problem instance and GPU hardware specifics. In addition, most GPU kernels are compiled once, performance optimizations are applied at application compile time. As a result, many GPU libraries and programs have limited adaptability to variations among problem instances and hardware configurations. These factors limit code reuse and the applicability of GPU computing to a wider variety of problems. This paper introduces GPGPU kernel specialization, a technique used to describe highly adaptable kernels that exhibit high performance across a wide range of programmer variables as well as different generations of GPUs. We also introduce our GPU Prototyping Framework (GPU-PF) for dynamic runtime generation of customized GPU kernels incorporating both problem and implementation-specific parameters. GPU-PF fully separates the GPU and CPU code so the GPU code can be compiled during program execution once all the parameters are known. This work explores the implementation and parameterization of two real world applications targeting two generations of NVIDIA CUDA-enabled GPUs using kernel specialization and GPU-PF: large template matching and cone-beam image reconstruction via back projection. Starting with high performance GPU kernels that compare favorably to multi-threaded reference implementations, kernel specialization is shown to increase adaptability while providing performance improvements including improved run time and reduction in resource usage. Kernel specialization offers productivity benefits, improved library code, and a means to increase the parameterizability of GPGPU implementations.

机译：对于某些类型的应用程序，图形处理单元（GPU）大大超过了CPU。但是，针对GPU进行编程具有挑战性。有许多影响性能的参数，其值可能会根据问题实例和GPU硬件规格而变化。此外，大多数GPU内核仅编译一次，而性能优化则在应用程序编译时应用。结果，许多GPU库和程序对问题实例和硬件配置之间变化的适应性有限。这些因素将代码重用和GPU计算的适用性限制在各种各样的问题上。本文介绍了GPGPU内核专业化技术，该技术用于描述具有高度适应性的内核，该内核在各种程序员变量以及不同世代的GPU中均具有高性能。我们还介绍了我们的GPU原型框架（GPU-PF），用于动态运行时生成包含问题和特定于实现的参数的自定义GPU内核。 GPU-PF将GPU和CPU代码完全分开，因此一旦知道所有参数，便可以在程序执行期间编译GPU代码。这项工作使用内核专业化和GPU-PF探索了针对两代支持NVIDIA CUDA的两代GPU的两个现实应用程序的实现和参数化：大型模板匹配和通过反投影的锥束图像重建。从性能优于多线程参考实现的高性能GPU内核开始，内核专用化可提高适应性，同时提供性能改进，包括缩短运行时间和减少资源使用。内核专业化可带来生产力优势，改进的库代码以及增加GPGPU实现的参数化能力的方法。

著录项

来源
《IEEE International Parallel Distributed Processing Symposium》|2013年|1037-1048|共12页
会议地点 Boston MA(US)
作者
Moore Nicholas; Leeser Miriam; King Laurie Smith;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
GPU; compilation; performance; template matching;

机译：GPU;汇编;性能;模板匹配;

相似文献

外文文献
中文文献
专利

1. Performance improvements for iterative electron tomography reconstruction using graphics processing units (GPUs) [J] . Palenstijn W. J., Batenburg K. J., Sijbers J. . Journal of Structural Biology . 2011,第2期

机译：使用图形处理单元（GPU）进行迭代电子层析成像重建的性能改进
2. High-performance iterative electron tomography reconstruction with long-object compensation using graphics processing units (GPUs) [J] . Xu Wei, Xu Fang, Jones Mel, Journal of Structural Biology . 2010,第2期

机译：使用图形处理单元（GPU）进行具有长对象补偿的高性能迭代电子断层扫描重建
3. Visualizing 3D/4D environmental data using many-core graphics processing units (GPUs) and multi-core central processing units (CPUs) [J] . Jing Li, Yunfeng Jiang, Chaowei Yang, Computers & geosciences . 2013,第SEPa期

机译：使用多核图形处理单元（GPU）和多核中央处理单元（CPU）可视化3D / 4D环境数据
4. Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs) [C] . Moore Nicholas, Leeser Miriam, King Laurie Smith IEEE International Parallel Distributed Processing Symposium . 2013

机译：内核专业化，用于改进图形处理单元（GPU）的适应性和性能
5. High performance multiscale image processing framework on multi-GPUs (graphics processing units) with applications to unbiased diffeomorphic atlas construction. [D] . Ha, Linh Khanh. 2011

机译：多GPU（图形处理单元）上的高性能多尺度图像处理框架，可应用于无偏微晶图集构造。
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. Graphics processing unit (GPU) implementation of image processing algorithms to improve system performance of the control acquisition, processing, and image display system (CAPIDS) of the micro-angiographic fluoroscope (MAF) [O] . S. N. Swetadri Vasan, Ciprian N. Ionita, A. H. Titus, 2012

机译：图形处理单元（GPU）实现图像处理算法，提高微血管造影荧光镜（MAF）的控制采集，处理和图像显示系统（Capids）的系统性能

Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅