首页> 外文会议>Proceedings of the 2013 ACM SIGPLAN conference on programming language design and implementation >Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines
【24h】

Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines

机译:Halide:一种用于优化图像处理管道中的并行性,局部性和重新计算的语言和编译器

获取原文
获取原文并翻译 | 示例

摘要

Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values. We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5× faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.
机译:图像处理管道结合了模版计算和流程序的挑战。它们由模具不同阶段的大型图形,复杂的缩小以及具有全局或数据相关访问模式的阶段组成。由于其复杂的结构,纯朴的管道实现与优化的管道之间的性能差异通常是一个数量级。高效的实现方式需要优化并行性和局部性,但是由于模板的性质,并行性,局部性和引入共享值的冗余重新计算之间存在根本的张力。我们提供了模板管线基础上的权衡空间的系统模型,计划表述,该表述描述了该空间中图像处理管道中每个阶段的具体点,以及针对Halide图像处理语言的优化编译器,该编译器可以从卤化物算法和时间表。将此编译器与时间表空间上的随机搜索相结合,可使简洁的可组合程序在各种实际图像处理管道上以及跨不同硬件体系结构(包括具有SIMD的多核和异构CPU)上实现最先进的性能+ GPU执行。通过几个小时内编写的简单Halide程序,我们证明了性能比专家在几周或几个月内优化的C,内在函数和CUDA实现快5倍,而这些图像处理应用是过去自动编译器无法企及的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号