首页> 外文期刊>Future generation computer systems >Improving locality of explicit one-step methods on GPUs by tiling across stages and time steps
【24h】

Improving locality of explicit one-step methods on GPUs by tiling across stages and time steps

机译:通过跨阶段和时间步进行平铺来改善GPU上明确的单步方法的局部性

获取原文
获取原文并翻译 | 示例
       

摘要

The performance of explicit parallel methods solving large systems of ordinary differential equations (ODEs) on CPUs is often memory bound. Therefore, locality optimizations, such as kernel fusion, are desirable. This paper exploits a special property of a large class of right-hand-side (RHS) functions to enable the fusion of computations of blocks of components of dependent stages of the method. This allows the derivation of tilings of the stages not only within one time step, but also spanning several successive time steps. Our approach is based on a representation of the ODE method by a data flow graph and allows efficient GPU code with fused kernels to be generated automatically for user-defined tilings. In particular, we investigate two generalized tiling strategies, trapezoidal and hexagonal tiling, and two different partitionings, which are evaluated experimentally for several different high- and low-order Runge-Kutta (RK) methods. (C) 2019 Elsevier B.V. All rights reserved.
机译:在CPU上求解大型系统的常微分方程(ODE)的显式并行方法的性能通常受内存限制。因此,需要诸如内核融合之类的局部性优化。本文利用了一大类右侧(RHS)函数的特殊性质,以实现该方法相关阶段的组成部分的块计算的融合。这样不仅可以在一个时间步内导出阶段的平铺,而且还可以跨几个连续的时间步进行派生。我们的方法基于数​​据流图对ODE方法的表示,并允许为用户定义的平铺自动生成带有融合内核的高效GPU代码。特别是,我们研究了两种广义的切片策略,即梯形和六边形切片,以及两种不同的分区,分别针对几种不同的高阶和低阶Runge-Kutta(RK)方法进行了实验评估。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号