首页> 外文会议>IEEE/ACM International Symposium on Code Generation and Optimization >From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization
【24h】

From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization

机译:从循环融合到内核融合:一种特定于域的位置优化方法

获取原文
获取外文期刊封面目录资料

摘要

Optimizing data-intensive applications such as image processing for GPU targets with complex memory hierarchies requires to explore the tradeoffs among locality, parallelism, and computation. Loop fusion as one of the classical optimization techniques has been proven effective to improve locality at the function level. Algorithms in image processing are increasing their complexities and generally consist of many kernels in a pipeline. The inter-kernel communications are intensive and exhibit another opportunity for locality improvement at the system level. The scope of this paper is an optimization technique called kernel fusion for data locality improvement. We present a formal description of the problem by defining an objective function for locality optimization. By transforming the fusion problem to a graph partitioning problem, we propose a solution based on the minimum cut technique to search fusible kernels recursively. In addition, we develop an analytic model to quantitatively estimate potential locality improvement by incorporating domain-specific knowledge and architecture details. The proposed technique is implemented in an image processing DSL and source-to-source compiler called Hipacc, and evaluated over six image processing applications on three Nvidia GPUs. A geometric mean speedup of up to 2.52 can be observed in our experiments.
机译:优化数据密集型应用程序(例如针对具有复杂内存层次结构的GPU目标的图像处理)需要探索局部性,并行性和计算之间的权衡。循环融合作为一种经典的优化技术已被证明可以有效地改善功能级别的局部性。图像处理中的算法正在增加其复杂性,并且通常由流水线中的许多内核组成。内核间的通信非常密集,并且在系统级别上展现了另一个提高局部性的机会。本文的范围是一种称为内核融合的优化技术,用于改善数据局部性。我们通过定义用于局部优化的目标函数来对问题进行形式化描述。通过将融合问题转换为图分区问题,我们提出了一种基于最小割技术的解决方案,以递归方式搜索可熔核。此外,我们开发了一种分析模型,通过结合特定领域的知识和体系结构详细信息来定量估计潜在的本地性改善。所提出的技术在图像处理DSL和称为Hipacc的源到源编译器中实现,并在三个Nvidia GPU上对六个图像处理应用程序进行了评估。在我们的实验中可以观察到高达2.52的几何平均加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号