首页> 外文会议>International Symposium on Embedded Multicore/Many-core Systems-on-Chip >A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

【24h】

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

机译：性能模型和基于效率的基于缓冲的策略自动分配GPU模具代码

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUs, but only if data buffering and other issues are handled properly. Finding a good code generation presents a number of challenges, one of which is the best way to make use of memory. GPUs have three types of on-chip storage: registers, shared memory, and read-only cache. The choice of type of storage and how it's used, a buffering strategy, for each stencil array (grid function, (GF)) not only requires a good understanding of its stencil pattern, but also the efficiency of each type of storage for the GF, to avoid squandering storage that would be more beneficial to another GF. Our code-generation framework supports five buffering strategies. For a stencil computation with N GFs, the total number of possible assignments is 5N. Large, complex stencil kernels may consist of dozens of GFs, resulting in significant search overhead. In this work, we present an analytic performance model for stencil computations on GPUs, and study the behavior of readonly cache and L2 cache. Next, we propose an efficiency-based assignment algorithm, which operates by scoring a change in buffering strategy for a GF using a combination of (a) the predicted execution time and (b) on-chip storage usage. By using this scoring an assignment for N GFs and b strategy types can be determined in (b - 1)N(N + 1)/2 steps. Results show that the performance model has good accuracy and that the assignment strategy is highly efficient.

机译：模具计算是几乎所有科学领域的计算机仿真的基础，例如计算流体力学，数据挖掘和图像处理。它们的大部分常规数据访问模式可能使它们能够利用GPU的高计算量和数据带宽，但前提是必须正确处理数据缓冲和其他问题。寻找良好的代码生成提出了许多挑战，其中之一就是利用内存的最佳方法。 GPU具有三种类型的片上存储：寄存器，共享内存和只读缓存。为每个模板阵列（网格函数（GF））选择存储类型及其使用方式，一种缓冲策略，不仅需要充分了解其模板模式，还需要对GF每种存储类型的效率有很好的了解，以免浪费存储空间，而这对于另一个GF会更有利。我们的代码生成框架支持五种缓冲策略。对于具有N个GF的模板计算，可能分配的总数为5N。大型，复杂的模版内核可能包含数十个GF，从而导致大量的搜索开销。在这项工作中，我们为GPU上的模版计算提供了一个分析性能模型，并研究了只读缓存和L2缓存的行为。接下来，我们提出一种基于效率的分配算法，该算法通过结合使用（a）预测执行时间和（b）片上存储使用量来评估GF的缓冲策略变化来进行操作。通过使用此评分，可以（b-1）N（N + 1）/ 2个步骤确定N个GF和b个策略类型的分配。结果表明，该性能模型具有良好的准确性，并且分配策略是高效的。

著录项

来源
《International Symposium on Embedded Multicore/Many-core Systems-on-Chip 》|2016年|361-368|共8页
会议地点
作者
Yue Hu; David M. Koppelman; Steven R. Brandt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Registers; Computational modeling; Instruction sets; Graphics processing units; Kernel; System-on-chip; Buffer storage;

机译：寄存器;计算建模;指令集;图形处理单元;内核;片上系统;缓冲存储;

相似文献

外文文献
中文文献
专利

1. A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs [J] . Garvey Joseph D., Abdelrahman Tarek S. Scientific programming . 2018 ,第PTa1期

机译：在GPU上自动进行模板计算性能调整的策略
2. A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs [J] . Joseph D. Garvey, Tarek S. Abdelrahman Scientific programming . 2018 ,第1期

机译：在GPU上自动进行模板计算性能调整的策略
3. Domain-Specific Optimization and Generation of High-Performance GPU Code for Stencil Computations [J] . Prashant Singh Rawat, Miheer Vaidya, Aravind Sukumaran-Rajam, Proceedings of the IEEE . 2018 ,第11期

机译：特定领域的优化以及用于模板计算的高性能GPU代码的生成
4. A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation [C] . Yue Hu, David M. Koppelman, Steven R. Brandt International Symposium on Embedded Multicore/Many-core Systems-on-Chip . 2016

机译：基于效率的自动GPU模具代码生成缓冲策略的效率模型和效率的分配
5. Autotuning, code generation and optimizing compiler technology for gpus. [D] . Khan, Malik Muhammad Zaki Murtaza. 2012

机译：自动调整，代码生成并优化GPU的编译器技术。
6. Automatic pharmacophore model generation using weighted substructure assignments [O] . Andreas Jahn, H Planatscher, Georg Hinselmann, 2010

机译：使用加权子结构分配自动生成药效团模型
7. A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs [O] . Joseph D. Garvey, Tarek S. Abdelrahman 2018

机译：GPU上模板计算自动性能调整的策略

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

摘要

著录项

相似文献

相关主题

期刊订阅