【24h】

On Optimizing Complex Stencils on GPUs

机译:关于优化GPU上的复杂模具

获取原文

摘要

Stencil computations are often the compute-intensive kernel in many scientific applications. With the increasing demand for computational accuracy, and the emergence of massively data-parallel high-bandwidth architectures like GPUs, stencils have steadily become more complex in terms of the stencil order, data accesses, and reuse patterns. Many prior efforts have focused on optimizing simpler stencil computations on various platforms. However, existing stencil code generators face challenges in optimizing such complex multi-statement stencil DAGs. This paper addresses the challenges in optimizing high-order stencil DAGs on GPUs by focusing on two key considerations: (1) enabling the domain expert to guide the code optimization, which may otherwise be extremely challenging for complex stencils; and (2) using bottleneck analysis via runtime profiling to guide the application of optimizations, and the tuning of various code generation parameters. We implement these abstractions in a prototype code generation framework termed Artemis, and evaluate its efficacy over multiple stencil kernels with varying complexity and operational intensity on an NVIDIA P100 GPU.
机译:在许多科学应用中,模具计算通常是计算密集型内核。随着对计算精度的日益增长的需求,以及大规模数据并行的高带宽架构(如GPU)的出现,模版在模版顺序,数据访问和重用模式方面逐渐变得越来越复杂。许多先前的努力集中在优化各种平台上的更简单的模板计算上。但是,现有的模板代码生成器在优化这种复杂的多语句模板DAG时面临挑战。本文着重于两个主要考虑因素,解决了在GPU上优化高阶模版DAG的挑战:(1)使领域专家能够指导代码优化,否则对于复杂的模版而言可能极具挑战性; (2)通过运行时分析使用瓶颈分析来指导优化的应用以及各种代码生成参数的调整。我们在称为Artemis的原型代码生成框架中实现了这些抽象,并在NVIDIA P100 GPU上评估了具有不同复杂性和操作强度的多个模板内核的功效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号