首页> 外文会议>IEEE International Parallel Distributed Processing Symposium >Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)
【24h】

Kernel Specialization for Improved Adaptability and Performance on Graphics Processing Units (GPUs)

机译:内核专业化,可提高图形处理单元(GPU)的适应性和性能

获取原文

摘要

Graphics processing units (GPUs) offer significant speedups over CPUs for certain classes of applications. However, programming for GPUs is challenging. There are many parameters that affect performance and their values may change depending on both problem instance and GPU hardware specifics. In addition, most GPU kernels are compiled once, performance optimizations are applied at application compile time. As a result, many GPU libraries and programs have limited adaptability to variations among problem instances and hardware configurations. These factors limit code reuse and the applicability of GPU computing to a wider variety of problems. This paper introduces GPGPU kernel specialization, a technique used to describe highly adaptable kernels that exhibit high performance across a wide range of programmer variables as well as different generations of GPUs. We also introduce our GPU Prototyping Framework (GPU-PF) for dynamic runtime generation of customized GPU kernels incorporating both problem and implementation-specific parameters. GPU-PF fully separates the GPU and CPU code so the GPU code can be compiled during program execution once all the parameters are known. This work explores the implementation and parameterization of two real world applications targeting two generations of NVIDIA CUDA-enabled GPUs using kernel specialization and GPU-PF: large template matching and cone-beam image reconstruction via back projection. Starting with high performance GPU kernels that compare favorably to multi-threaded reference implementations, kernel specialization is shown to increase adaptability while providing performance improvements including improved run time and reduction in resource usage. Kernel specialization offers productivity benefits, improved library code, and a means to increase the parameterizability of GPGPU implementations.
机译:对于某些类型的应用程序,图形处理单元(GPU)大大超过了CPU。但是,针对GPU进行编程具有挑战性。有许多影响性能的参数,其值可能会根据问题实例和GPU硬件规格而变化。此外,大多数GPU内核仅编译一次,而性能优化则在应用程序编译时应用。结果,许多GPU库和程序对问题实例和硬件配置之间变化的适应性有限。这些因素将代码重用和GPU计算的适用性限制在各种各样的问题上。本文介绍了GPGPU内核专业化技术,该技术用于描述具有高度适应性的内核,该内核在各种程序员变量以及不同世代的GPU中均具有高性能。我们还介绍了我们的GPU原型框架(GPU-PF),用于动态运行时生成包含问题和特定于实现的参数的自定义GPU内核。 GPU-PF将GPU和CPU代码完全分开,因此一旦知道所有参数,便可以在程序执行期间编译GPU代码。这项工作使用内核专业化和GPU-PF探索了针对两代支持NVIDIA CUDA的两代GPU的两个现实应用程序的实现和参数化:大型模板匹配和通过反投影的锥束图像重建。从性能优于多线程参考实现的高性能GPU内核开始,内核专用化可提高适应性,同时提供性能改进,包括缩短运行时间和减少资源使用。内核专业化可带来生产力优势,改进的库代码以及增加GPGPU实现的参数化能力的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号