Skeleton-based automatic parallelization of image processing algorithms for GPUs

机译：基于骨架的GPU图像处理算法的自动并行化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Graphics Processing Units (GPUs) are becoming increasingly important in high performance computing. To maintain high quality solutions, programmers have to efficiently parallelize and map their algorithms. This task is far from trivial, leading to the necessity to automate this process. In this paper, we present a technique to automatically parallelize and map sequential code on a GPU, without the need for code-annotations. This technique is based on skeletonization and is targeted at image processing algorithms. Skeletonization separates the structure of a parallel computation from the algorithm's functionality, enabling efficient implementations without requiring architecture knowledge from the programmer. We define a number of skeleton classes, each enabling GPU specific parallelization techniques and optimizations, including automatic thread creation, on-chip memory usage and memory coalescing. Recently, similar skeletonization techniques have been applied to GPUs. Our work uses domain specific skeletons and a finer-grained classification of algorithms. Comparing skeleton-based parallelization to existing GPU code generators in general, we potentially achieve a higher hardware efficiency by enabling algorithm restructuring through skeletons. In a set of benchmarks, we show that the presented skeleton-based approach generates highly optimized code, achieving high data throughput. Additionally, we show that the automatically generated code performs close or equal to manually mapped and optimized code. We conclude that skeleton-based parallelization for GPUs is promising, but we do believe that future research must focus on the identification of a finer-grained and complete classification.

机译：图形处理单元（GPU）在高性能计算中变得越来越重要。为了维持高质量的解决方案，程序员必须有效地并行化和映射他们的算法。这项任务绝非易事，因此有必要使这一过程自动化。在本文中，我们提出了一种在GPU上自动并行化和映射顺序代码的技术，而无需代码注释。该技术基于骨架化，并且针对图像处理算法。骨架化将并行计算的结构与算法的功能分离开来，从而可以实现高效的实现，而无需程序员提供架构知识。我们定义了许多框架类，每个类都支持GPU特定的并行化技术和优化，包括自动线程创建，片上内存使用和内存合并。最近，类似的骨架化技术已应用于GPU。我们的工作使用领域特定的框架和算法的更细分类。通常，将基于框架的并行化与现有的GPU代码生成器进行比较，通过启用通过框架进行的算法重构，我们有可能实现更高的硬件效率。在一组基准测试中，我们证明了所提出的基于框架的方法生成了高度优化的代码，从而实现了高数据吞吐量。此外，我们显示了自动生成的代码与手动映射和优化的代码的性能接近或相等。我们得出结论，GPU的基于骨架的并行化是有前途的，但是我们确实相信，未来的研究必须集中在识别更细粒度和完整的分类上。

著录项

来源
《2011 International Conference on Embedded Computer Systems : Architectures, Modeling and Simulation》|2011年|p.25-32|共8页
会议地点
作者
Nugteren Cedric; Corporaal Henk; Mesman Bart;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Bones: an automatic skeleton-based C-to-CUDA compiler for GPUs [J] . Chris Lupo Computing reviews . 2015,第5期

机译：骨骼：用于GPU的基于骨架的自动C-to-CUDA编译器
2. Bones: An Automatic Skeleton-Based C-to-CUDA Compiler for GPUs [J] . Nugteren Cedric, Corporaal Henk ACM Transactions on Architecture and Code Optimization . 2014,第4期

机译：骨骼：用于GPU的基于骨架的自动C-to-CUDA编译器
3. Massively Parallel GPU Design of Automatic Target Generation Process in Hyperspectral Imagery [J] . Li Xiaojie, Huang Bormin, Zhao Kai Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2015,第6期

机译：高光谱图像中目标自动生成过程的大规模并行GPU设计
4. Skeleton-based automatic parallelization of image processing algorithms for GPUs [C] . Nugteren Cedric, Corporaal Henk, Mesman Bart International Conference on Embedded Computer Systems . 2011

机译：基于骨架的GPU图像处理算法的自动并行化
5. Automatic Parallelization for GPUs [D] . Jablin, Thomas B. 2013

机译：GPU的自动并行化
6. Graphics Processing Unit (GPU) implementation of image processing algorithms to improve system performance of the Control Acquisition Processing and Image Display System (CAPIDS) of the Micro-Angiographic Fluoroscope (MAF) [O] . S.N. Swetadri Vasan, Ciprian N. Ionita, A.H. Titus, -1

机译：图形处理单元（GpU）执行的图像处理算法以改善控制采集处理的系统的性能以及微造影荧光镜的图像显示系统（CapIDs）（maF）
7. Skeleton-based automatic parallelization of image processing algorithms for gpus [O] . Cedric Nugteren, Henk Corporaal, Bart Mesman 2011

机译：基于骨架的gpus图像处理算法的自动并行化

Skeleton-based automatic parallelization of image processing algorithms for GPUs

摘要

著录项

相似文献

相关主题

期刊订阅