A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

机译：基于效率的自动GPU模具代码生成缓冲策略的效率模型和效率的分配

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUs, but only if data buffering and other issues are handled properly. Finding a good code generation presents a number of challenges, one of which is the best way to make use of memory. GPUs have three types of on-chip storage: registers, shared memory, and read-only cache. The choice of type of storage and how it's used, a buffering strategy, for each stencil array (grid function, (GF)) not only requires a good understanding of its stencil pattern, but also the efficiency of each type of storage for the GF, to avoid squandering storage that would be more beneficial to another GF. Our code-generation framework supports five buffering strategies. For a stencil computation with N GFs, the total number of possible assignments is 5N. Large, complex stencil kernels may consist of dozens of GFs, resulting in significant search overhead. In this work, we present an analytic performance model for stencil computations on GPUs, and study the behavior of readonly cache and L2 cache. Next, we propose an efficiency-based assignment algorithm, which operates by scoring a change in buffering strategy for a GF using a combination of (a) the predicted execution time and (b) on-chip storage usage. By using this scoring an assignment for N GFs and b strategy types can be determined in (b - 1)N(N + 1)/2 steps. Results show that the performance model has good accuracy and that the assignment strategy is highly efficient.

机译：模板计算构成了几乎每个科学领域的计算机模拟的基础，例如计算流体动力学，数据挖掘和图像处理。它们主要是常规数据访问模式可能使它们能够利用GPU的高计算和数据带宽，但仅当正确处理数据缓冲和其他问题时。找到一个很好的代码生成呈现了许多挑战，其中一个是利用内存的最佳方式。 GPU有三种片上存储：寄存器，共享内存和只读缓存。存储类型的选择以及它是如何使用的缓冲策略，每个模板阵列（网格函数，（gf））不仅需要对其模板模式的良好理解，而且还需要良好地了解GF的每种类型存储的效率，避免对另一个GF更有益的剥离储存。我们的代码框架支持五种缓冲策略。对于使用N GFS的模版计算，可能的分配总数为5N。大型复杂的模板内核可能由数十个GF组成，导致显着的搜索开销。在这项工作中，我们在GPU上介绍了模板计算的分析性能模型，并研究了ReadOnly缓存和L2缓存的行为。接下来，我们提出了一种基于效率的分配算法，其通过使用（a）所述预测执行时间和（b）片上存储使用的组合来通过对GF进行缓冲策略的改变来操作。通过使用该评分，可以在（b - 1）n（n + 1）/ 2步骤中确定N GFS和B策略类型的分配。结果表明，性能模型具有良好的准确性，并且分配策略高效。

著录项

来源
《International Symposium on Embedded Multicore/Many-core Systems-on-Chip》|2016年|xvi 394 p.|共8页
会议地点
作者
Yue Hu; David M. Koppelman; Steven R. Brandt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN43-532;
关键词
Registers; Computational modeling; Instruction sets; Graphics processing units; Kernel; System-on-chip; Buffer storage;

机译：寄存器;计算建模;指令集;图形处理单元;内核;片上系统;缓冲存储;

相似文献

外文文献
中文文献
专利

1. A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs [J] . Garvey Joseph D., Abdelrahman Tarek S. Scientific programming . 2018,第PTa1期

机译：在GPU上自动进行模板计算性能调整的策略
2. A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs [J] . Joseph D. Garvey, Tarek S. Abdelrahman Scientific programming . 2018,第1期

机译：在GPU上自动进行模板计算性能调整的策略
3. Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations [J] . Prashant Singh Rawat, Miheer Vaidya, Aravind Sukumaran-Rajam, Proceedings of the IEEE . 2018,第11期

机译：特定领域的优化以及用于模板计算的高性能GPU代码的生成
4. A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation [C] . Yue Hu, David M. Koppelman, Steven R. Brandt International Symposium on Embedded Multicore/Many-core Systems-on-Chip . 2016

机译：性能模型和基于效率的基于缓冲的策略自动分配GPU模具代码
5. Autotuning, code generation and optimizing compiler technology for gpus. [D] . Khan, Malik Muhammad Zaki Murtaza. 2012

机译：自动调整，代码生成并优化GPU的编译器技术。
6. Automatic pharmacophore model generation using weighted substructure assignments [O] . Andreas Jahn, H Planatscher, Georg Hinselmann, 2010

机译：使用加权子结构分配自动生成药效团模型
7. A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs [O] . Joseph D. Garvey, Tarek S. Abdelrahman 2018

机译：GPU上模板计算自动性能调整的策略

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

摘要

著录项

相似文献

相关主题

期刊订阅