首页> 外文期刊>Emerging Topics in Computing, IEEE Transactions on >Pro++: A Profiling Framework for Primitive-Based GPU Programming
【24h】

Pro++: A Profiling Framework for Primitive-Based GPU Programming

机译:Pro ++:基于基元的GPU编程的性能分析框架

获取原文
获取原文并翻译 | 示例

摘要

Parallelizing software applications through the use of existing optimized primitives is a common trend that mediates the complexity of manual parallelization and the use of less efficient directive-based programming models. Parallel primitive libraries allow software engineers to map any sequential code to a target many-core architecture by identifying the most computational intensive code sections and mapping them into one or more existing primitives. On the other hand, the spreading of such a primitive-based programming model and the different graphic processing unit (GPU) architectures has led to a large and increasing number of third-party libraries, which often provide different implementations of the same primitive, each one optimized for a specific architecture. From the developer point of view, this moves the actual problem of parallelizing the software application to selecting, among the several implementations, the most efficient primitives for the target platform. This paper presents Pro++, a profiling framework for GPU primitives that allows measuring the implementation quality of a given primitive by considering the target architecture characteristics. The framework collects the information provided by a standard GPU profiler and combines them into optimization criteria. The criteria evaluations are weighed to distinguish the impact of each optimization on the overall quality of the primitive implementation. This paper shows how the tuning of the different weights has been conducted through the analysis of five of the most widespread existing primitive libraries and how the framework has been eventually applied to improve the implementation performance of two standard and widespread primitives.
机译:通过使用现有的优化原语来并行化软件应用程序是一种普遍的趋势,它介导了手动并行化的复杂性以及效率较低的基于指令的编程模型的使用。并行基元库允许软件工程师通过识别计算量最大的代码段并将它们映射到一个或多个现有基元中,从而将任何顺序代码映射到目标多核体系结构。另一方面,这种基于基元的编程模型和不同的图形处理单元(GPU)架构的传播导致了越来越多的第三方库,这些库通常提供同一基元的不同实现。针对特定架构进行了优化的一种。从开发人员的角度来看,这将使软件应用程序并行化的实际问题转移到在几种实现中选择目标平台最有效的原语。本文介绍了Pro ++,它是用于GPU原语的性能分析框架,允许通过考虑目标体系结构特征来测量给定原语的实现质量。该框架收集标准GPU探查器提供的信息,并将其组合为优化标准。权衡标准评估以区分每个优化对原始实现的整体质量的影响。本文展示了如何通过对五个最广泛使用的现有原始库进行分析来进行不同权重的调整,以及如何最终将该框架应用于改善两个标准且广泛使用的原始库的实现性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号