首页> 外文会议>2011 International Conference on Embedded Computer Systems : Architectures, Modeling and Simulation >Skeleton-based automatic parallelization of image processing algorithms for GPUs
【24h】

Skeleton-based automatic parallelization of image processing algorithms for GPUs

机译:基于骨架的GPU图像处理算法的自动并行化

获取原文

摘要

Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image processing algorithms. Skeletonization separates the structure of a parallel computation from the algorithm's functionality, enabling efficient implementations without requiring architecture knowledge from the programmer. We define a number of skeleton classes, each enabling GPU specific parallelization techniques and optimizations, including automatic thread creation, on-chip memory usage and memory coalescing. Recently, similar skeletonization techniques have been applied to GPUs. Our work uses domain specific skeletons and a finer-grained classification of algorithms. Comparing skeleton-based parallelization to existing GPU code generators in general, we potentially achieve a higher hardware efficiency by enabling algorithm restructuring through skeletons. In a set of benchmarks, we show that the presented skeleton-based approach generates highly optimized code, achieving high data throughput. Additionally, we show that the automatically generated code performs close or equal to manually mapped and optimized code. We conclude that skeleton-based parallelization for GPUs is promising, but we do believe that future research must focus on the identification of a finer-grained and complete classification.
机译:图形处理单元(GPU)在高性能计算中变得越来越重要。为了维持高质量的解决方案,程序员必须有效地并行化和映射他们的算法。这项任务绝非易事,因此有必要使这一过程自动化。在本文中,我们提出了一种在GPU上自动并行化和映射顺序代码的技术,而无需代码注释。该技术基于骨架化,并且针对图像处理算法。骨架化将并行计算的结构与算法的功能分离开来,从而可以实现高效的实现,而无需程序员提供架构知识。我们定义了许多框架类,每个类都支持GPU特定的并行化技术和优化,包括自动线程创建,片上内存使用和内存合并。最近,类似的骨架化技术已应用于GPU。我们的工作使用领域特定的框架和算法的更细分类。通常,将基于框架的并行化与现有的GPU代码生成器进行比较,通过启用通过框架进行的算法重构,我们有可能实现更高的硬件效率。在一组基准测试中,我们证明了所提出的基于框架的方法生成了高度优化的代码,从而实现了高数据吞吐量。此外,我们显示了自动生成的代码与手动映射和优化的代码的性能接近或相等。我们得出结论,GPU的基于骨架的并行化是有前途的,但是我们确实相信,未来的研究必须集中在识别更细粒度和完整的分类上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号