On Optimizing Complex Stencils on GPUs

机译：关于优化GPU上的复杂模具

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stencil computations are often the compute-intensive kernel in many scientific applications. With the increasing demand for computational accuracy, and the emergence of massively data-parallel high-bandwidth architectures like GPUs, stencils have steadily become more complex in terms of the stencil order, data accesses, and reuse patterns. Many prior efforts have focused on optimizing simpler stencil computations on various platforms. However, existing stencil code generators face challenges in optimizing such complex multi-statement stencil DAGs. This paper addresses the challenges in optimizing high-order stencil DAGs on GPUs by focusing on two key considerations: (1) enabling the domain expert to guide the code optimization, which may otherwise be extremely challenging for complex stencils; and (2) using bottleneck analysis via runtime profiling to guide the application of optimizations, and the tuning of various code generation parameters. We implement these abstractions in a prototype code generation framework termed Artemis, and evaluate its efficacy over multiple stencil kernels with varying complexity and operational intensity on an NVIDIA P100 GPU.

机译：在许多科学应用中，模具计算通常是计算密集型内核。随着对计算精度的日益增长的需求，以及大规模数据并行的高带宽架构（如GPU）的出现，模版在模版顺序，数据访问和重用模式方面逐渐变得越来越复杂。许多先前的努力集中在优化各种平台上的更简单的模板计算上。但是，现有的模板代码生成器在优化这种复杂的多语句模板DAG时面临挑战。本文着重于两个主要考虑因素，解决了在GPU上优化高阶模版DAG的挑战：（1）使领域专家能够指导代码优化，否则对于复杂的模版而言可能极具挑战性; （2）通过运行时分析使用瓶颈分析来指导优化的应用以及各种代码生成参数的调整。我们在称为Artemis的原型代码生成框架中实现了这些抽象，并在NVIDIA P100 GPU上评估了具有不同复杂性和操作强度的多个模板内核的功效。

著录项

来源
《IEEE International Parallel and Distributed Processing Symposium》|2019年|641-652|共12页
会议地点
作者
Prashant Rawat; Miheer Vaidya; Aravind Sukumaran-Rajam; Atanas Rountev; Louis-Noel Pouchet; P. Sadayappan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Stencil Computations; Code optimization; DSL; GPGPU;

机译：模具计算;代码优化; DSL; GPGPU;

相似文献

外文文献
中文文献
专利

1. Register Optimizations for Stencils on GPUs [J] . Prashant Singh Rawat, Aravind Sukumaran-Rajam, Atanas Rountev, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第1期

机译：注册GPU上的模板优化
2. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations [J] . Jiayuan Meng, Kevin Skadron International journal of parallel programming . 2011,第1期

机译：带有Ghost区域优化的GPU上的迭代模板循环的性能研究
3. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations [J] . Jiayuan Meng, Kevin Skadron International Journal of Parallel Programming . 2011,第1期

机译：带有Ghost区域优化的GPU上的迭代模板循环的性能研究
4. On Optimizing Complex Stencils on GPUs [C] . Prashant Rawat, Miheer Vaidya, Aravind Sukumaran-Rajam, IEEE International Parallel and Distributed Processing Symposium . 2019

机译：在GPU上优化复杂模板
5. Optimization of Stencil Computations on GPUs [D] . Rawat, Prashant Singh. 2018

机译：在GPU上优化模板计算
6. Next-generation acceleration and code optimization for light transport in turbid media using GPUs [O] . Erik Alerstam, William Chun Yip Lo, Tianyi David Han, 2010

机译：下一代加速和代码优化使用GPU在混浊的介质中传输
7. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations [O] . Jiayuan Meng, Kevin Skadron 2012

机译：具有鬼区优化的GpU上迭代模板循环的性能研究

On Optimizing Complex Stencils on GPUs

摘要

著录项

相似文献

相关主题

期刊订阅