首页> 外文会议>International Symposium on Embedded Multicore/Many-core Systems-on-Chip >A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation
【24h】

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

机译:性能模型和基于效率的基于缓冲的策略自动分配GPU模具代码

获取原文

摘要

Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUs, but only if data buffering and other issues are handled properly. Finding a good code generation presents a number of challenges, one of which is the best way to make use of memory. GPUs have three types of on-chip storage: registers, shared memory, and read-only cache. The choice of type of storage and how it's used, a buffering strategy, for each stencil array (grid function, (GF)) not only requires a good understanding of its stencil pattern, but also the efficiency of each type of storage for the GF, to avoid squandering storage that would be more beneficial to another GF. Our code-generation framework supports five buffering strategies. For a stencil computation with N GFs, the total number of possible assignments is 5N. Large, complex stencil kernels may consist of dozens of GFs, resulting in significant search overhead. In this work, we present an analytic performance model for stencil computations on GPUs, and study the behavior of readonly cache and L2 cache. Next, we propose an efficiency-based assignment algorithm, which operates by scoring a change in buffering strategy for a GF using a combination of (a) the predicted execution time and (b) on-chip storage usage. By using this scoring an assignment for N GFs and b strategy types can be determined in (b - 1)N(N + 1)/2 steps. Results show that the performance model has good accuracy and that the assignment strategy is highly efficient.
机译:模具计算是几乎所有科学领域的计算机仿真的基础,例如计算流体力学,数据挖掘和图像处理。它们的大部分常规数据访问模式可能使它们能够利用GPU的高计算量和数据带宽,但前提是必须正确处理数据缓冲和其他问题。寻找良好的代码生成提出了许多挑战,其中之一就是利用内存的最佳方法。 GPU具有三种类型的片上存储:寄存器,共享内存和只读缓存。为每个模板阵列(网格函数(GF))选择存储类型及其使用方式,一种缓冲策略,不仅需要充分了解其模板模式,还需要对GF每种存储类型的效率有很好的了解,以免浪费存储空间,而这对于另一个GF会更有利。我们的代码生成框架支持五种缓冲策略。对于具有N个GF的模板计算,可能分配的总数为5N。大型,复杂的模版内核可能包含数十个GF,从而导致大量的搜索开销。在这项工作中,我们为GPU上的模版计算提供了一个分析性能模型,并研究了只读缓存和L2缓存的行为。接下来,我们提出一种基于效率的分配算法,该算法通过结合使用(a)预测执行时间和(b)片上存储使用量来评估GF的缓冲策略变化来进行操作。通过使用此评分,可以(b-1)N(N + 1)/ 2个步骤确定N个GF和b个策略类型的分配。结果表明,该性能模型具有良好的准确性,并且分配策略是高效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号