首页> 外文会议> >Scalable Kernel Fusion for Memory-Bound GPU Applications
【24h】

Scalable Kernel Fusion for Memory-Bound GPU Applications

机译:适用于内存绑定GPU应用的可扩展内核融合

获取原文

摘要

GPU implementations of HPC applications relying on finite difference methods can include tens of kernels that are memory-bound. Kernel fusion can improve performance by reducing data traffic to off-chip memory, kernels that share data arrays are fused to larger kernels where on-chip cache is used to hold the data reused by instructions originating from different kernels. The main challenges are a) searching for the optimal kernel fusions while constrained by data dependencies and kernels' precedences and b) effectively applying kernel fusion to achieve speedup. This paper introduces a problem definition and proposes a scalable method for searching the space of possible kernel fusions to identify optimal kernel fusions for large problems. The paper also proposes a codeless performance upper-bound projection model to achieve effective fusions. Results show that using the proposed scalable method for kernel fusion improved the performance of two real-world applications containing tens of kernels by 1.35x and 1.2x.
机译:依靠有限差分方法的HPC应用程序的GPU实现可以包括数十个受内存限制的内核。内核融合可以通过减少流向片外存储器的数据流量来提高性能,共享数据阵列的内核与更大的内核融合在一起,在更大的内核中,片上缓存用于保存来自不同内核的指令所重用的数据。主要挑战是:a)在受到数据依赖关系和内核优先级约束的同时,寻找最佳的内核融合方法; b)有效地应用内核融合方法来实现加速。本文介绍了问题定义,并提出了一种可扩展的方法来搜索可能的核融合的空间,以识别大型问题的最佳核融合。本文还提出了一种无代码性能上限投影模型,以实现有效的融合。结果表明,使用提议的可扩展方法进行内核融合可将两个包含数十个内核的实际应用程序的性能提高1.35倍和1.2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号