首页> 外文期刊>Journal of Scientific Computing >Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study
【24h】

Exploiting Batch Processing on Streaming Architectures to Solve 2D Elliptic Finite Element Problems: A Hybridized Discontinuous Galerkin (HDG) Case Study

机译:在流体系结构上利用批处理来解决2D椭圆有限元问题:混合不连续Galerkin(HDG)案例研究

获取原文
获取原文并翻译 | 示例
       

摘要

Numerical methods for elliptic partial differential equations (PDEs) within both continuous and hybridized discontinuous Galerkin (HDG) frameworks share the same general structure: local (elemental) matrix generation followed by a global linear system assembly and solve. The lack of inter-element communication and easily parallelizable nature of the local matrix generation stage coupled with the parallelization techniques developed for the linear system solvers make a numerical scheme for elliptic PDEs a good candidate for implementation on streaming architectures such as modern graphical processing units (GPUs). We propose an algorithmic pipeline for mapping an elliptic finite element method to the GPU and perform a case study for a particular method within the HDG framework. This study provides comparison between CPU and GPU implementations of the method as well as highlights certain performance-crucial implementation details. The choice of the HDG method for the case study was dictated by the computationally-heavy local matrix generation stage as well as the reduced trace-based communication pattern, which together make the method amenable to the fine-grained parallelism of GPUs. We demonstrate that the HDG method is well-suited for GPU implementation, obtaining total speedups on the order of 30-35 times over a serial CPU implementation for moderately sized problems.
机译:连续和混合不连续Galerkin(HDG)框架内的椭圆偏微分方程(PDE)的数值方法都具有相同的通用结构:局部(基本)矩阵生成,然后进行整体线性系统组装和求解。元素间通信的缺乏以及本地矩阵生成阶段的易于并行化的特性,再加上为线性系统求解器开发的并行化技术,使得椭圆形PDE的数值方案成为在流式体系结构(例如现代图形处理单元)上实现的不错选择( GPU)。我们提出了一种用于将椭圆形有限元方法映射到GPU的算法流水线,并针对HDG框架内的特定方法进行了案例研究。这项研究提供了该方法的CPU和GPU实现之间的比较,并重点介绍了某些性能至关重要的实现细节。用于该案例研究的HDG方法的选择取决于计算量大的局部矩阵生成阶段以及减少的基于迹线的通信模式,这些因素共同使该方法适用于GPU的细粒度并行性。我们证明了HDG方法非常适合GPU实施,对于中等大小的问题,其总加速比串行CPU实施快30-35倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号