首页> 外文期刊>Computing and Visualization in Science >Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs
【24h】

Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs

机译:各向异性椭圆PDE的预处理共轭梯度求解器的无矩阵GPU实现

获取原文
获取原文并翻译 | 示例

摘要

Many problems in geophysical and atmospheric modelling require the fast solution of elliptic partial differential equations (PDEs) in “flat” three dimensional geometries. In particular, an anisotropic elliptic PDE for the pressure correction has to be solved at every time step in the dynamical core of many numerical weather prediction (NWP) models, and equations of a very similar structure arise in global ocean models, subsurface flow simulations and gas and oil reservoir modelling. The elliptic solve is often the bottleneck of the forecast, and to meet operational requirements an algorithmically optimal method has to be used and implemented efficiently. Graphics Processing Units (GPUs) have been shown to be highly efficient (both in terms of absolute performance and power consumption) for a wide range of applications in scientific computing, and recently iterative solvers have been parallelised on these architectures. In this article we describe the GPU implementation and optimisation of a Preconditioned Conjugate Gradient (PCG) algorithm for the solution of a three dimensional anisotropic elliptic PDE for the pressure correction in NWP. Our implementation exploits the strong vertical anisotropy of the elliptic operator in the construction of a suitable preconditioner. As the algorithm is memory bound, performance can be improved significantly by reducing the amount of global memory access. We achieve this by using a matrix-free implementation which does not require explicit storage of the matrix and instead recalculates the local stencil. Global memory access can also be reduced by rewriting the PCG algorithm using loop fusion and we show that this further reduces the runtime on the GPU. We demonstrate the performance of our matrix-free GPU code by comparing it both to a sequential CPU implementation and to a matrix-explicit GPU code which uses existing CUDA libraries. The absolute performance of the algorithm for different problem sizes is quantified in terms of floating point throughput and global memory bandwidth.
机译:地球物理和大气建模中的许多问题都需要在“平面”三维几何中快速求解椭圆形偏微分方程(PDE)。特别是,在许多数值天气预报(NWP)模型的动力核心中,必须在每个时间步求解用于压力校正的各向异性椭圆PDE,并且在全球海洋模型,地下流动模拟和分析中会产生结构非常相似的方程。油气藏建模。椭圆求解通常是预测的瓶颈,并且为了满足操作要求,必须使用算法上最优化的方法并有效地加以实施。对于科学计算中的广泛应用,图形处理单元(GPU)已显示出很高的效率(在绝对性能和功耗方面),并且最近在这些体系结构上并行化了迭代求解器。在本文中,我们描述了GPU的实现和预条件共轭梯度(PCG)算法的优化,用于求解NWP中用于压力校正的三维各向异性椭圆PDE。我们的实现在适当的预处理器的构造中利用了椭圆算子的强垂直各向异性。由于该算法受内存限制,因此可以通过减少全局内存访问量来显着提高性能。我们通过使用无矩阵实现来实现此目的,该实现不需要显式存储矩阵,而是重新计算局部模板。通过使用循环融合重写PCG算法,还可以减少全局内存访问,我们证明这进一步减少了GPU上的运行时间。通过将无矩阵GPU代码与顺序CPU实现和使用现有CUDA库的矩阵显式GPU代码进行比较,我们演示了无矩阵GPU代码的性能。根据浮点吞吐量和全局内存带宽来量化针对不同问题大小的算法的绝对性能。

著录项

  • 来源
    《Computing and Visualization in Science》 |2013年第2期|41-58|共18页
  • 作者单位

    Department of Mathematical Sciences University of Bath">(1);

    Edinburgh Parallel Computing Centre (EPCC) The University of Edinburgh">(2);

    Department of Mathematical Sciences University of Bath">(1);

    Edinburgh Parallel Computing Centre (EPCC) The University of Edinburgh">(2);

    OT-Med Europôle Méditerranéen de l’Arbois">(3);

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号