首页> 外文会议>International Symposium on Embedded Multicore/Many-core Systems-on-Chip >A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation
【24h】

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

机译:基于效率的自动GPU模具代码生成缓冲策略的效率模型和效率的分配

获取原文

摘要

Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUs, but only if data buffering and other issues are handled properly. Finding a good code generation presents a number of challenges, one of which is the best way to make use of memory. GPUs have three types of on-chip storage: registers, shared memory, and read-only cache. The choice of type of storage and how it's used, a buffering strategy, for each stencil array (grid function, (GF)) not only requires a good understanding of its stencil pattern, but also the efficiency of each type of storage for the GF, to avoid squandering storage that would be more beneficial to another GF. Our code-generation framework supports five buffering strategies. For a stencil computation with N GFs, the total number of possible assignments is 5N. Large, complex stencil kernels may consist of dozens of GFs, resulting in significant search overhead. In this work, we present an analytic performance model for stencil computations on GPUs, and study the behavior of readonly cache and L2 cache. Next, we propose an efficiency-based assignment algorithm, which operates by scoring a change in buffering strategy for a GF using a combination of (a) the predicted execution time and (b) on-chip storage usage. By using this scoring an assignment for N GFs and b strategy types can be determined in (b - 1)N(N + 1)/2 steps. Results show that the performance model has good accuracy and that the assignment strategy is highly efficient.
机译:模板计算构成了几乎每个科学领域的计算机模拟的基础,例如计算流体动力学,数据挖掘和图像处理。它们主要是常规数据访问模式可能使它们能够利用GPU的高计算和数据带宽,但仅当正确处理数据缓冲和其他问题时。找到一个很好的代码生成呈现了许多挑战,其中一个是利用内存的最佳方式。 GPU有三种片上存储:寄存器,共享内存和只读缓存。存储类型的选择以及它是如何使用的缓冲策略,每个模板阵列(网格函数,(gf))不仅需要对其模板模式的良好理解,而且还需要良好地了解GF的每种类型存储的效率,避免对另一个GF更有益的剥离储存。我们的代码框架支持五种缓冲策略。对于使用N GFS的模版计算,可能的分配总数为5N。大型复杂的模板内核可能由数十个GF组成,导致显着的搜索开销。在这项工作中,我们在GPU上介绍了模板计算的分析性能模型,并研究了ReadOnly缓存和L2缓存的行为。接下来,我们提出了一种基于效率的分配算法,其通过使用(a)所述预测执行时间和(b)片上存储使用的组合来通过对GF进行缓冲策略的改变来操作。通过使用该评分,可以在(b - 1)n(n + 1)/ 2步骤中确定N GFS和B策略类型的分配。结果表明,性能模型具有良好的准确性,并且分配策略高效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号