首页> 外文期刊>Journal of Parallel and Distributed Computing >On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method
【24h】

On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method

机译:使用radix-4 PSCR方法的GPU实现求解可分块三对角线性系统

获取原文
获取原文并翻译 | 示例

摘要

Partial solution variant of the cyclic reduction (PSCR) method is a direct solver that can be applied to certain types of separable block tridiagonal linear systems. Such linear systems arise, e.g., from the Poisson and the Helmholtz equations discretized with bilinear finite-elements. Furthermore, the separability of the linear system entails that the discretization domain has to be rectangular and the discretization mesh orthogonal. A generalized graphics processing unit (GPU) implementation of the PSCR method is presented. The numerical results indicate up to 24-fold speedups when compared to an equivalent CPU implementation that utilizes a single CPU core. Attained floating point performance is analyzed using roofline performance analysis model and the resulting models show that the attained floating point performance is mainly limited by the off-chip memory bandwidth and the effectiveness of a tridiagonal solver used to solve arising tridiagonal subproblems. The performance is accelerated using off-line autotuning techniques.
机译:循环归约法(PSCR)的部分解决方案变体是一种直接求解器,可以应用于某些类型的可分离块三对角线性系统。这样的线性系统例如来自以双线性有限元离散的泊松和亥姆霍兹方程。此外,线性系统的可分离性要求离散化域必须为矩形,并且离散化网格为正交。提出了PSCR方法的通用图形处理单元(GPU)实现。与使用单个CPU内核的同等CPU实现方案相比,数值结果表明加速高达24倍。使用roofline性能分析模型对获得的浮点性能进行了分析,结果模型表明,获得的浮点性能主要受片外存储带宽和用于解决出现的三对角子问题的三对角求解器的有效性的限制。使用离线自动调整技术可加快性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号