首页> 外文会议>International Conference on High Performance Computing and Simulation >Acceleration Techniques for FETI Solvers for GPU Accelerators
【24h】

Acceleration Techniques for FETI Solvers for GPU Accelerators

机译:用于GPU加速器的FETI溶剂的加速技术

获取原文

摘要

In this paper we evaluate several approaches to performing simultaneous matrix-vector multiplication of large numbers of matrices on a GPU accelerator. The goal of this evaluation is to develop efficient techniques for massively parallel Hybrid Total FETI solvers in our ESPRESO library. FETI solvers generally use sparse matrices. To overcome this we previously proposed the Local Schur Complement method for FETI to convert sparse matrices to their dense representation, without significantly increasing the memory requirements of the GPU accelerator. We selected the following techniques: standard GEMV, CUDA streams, dynamic parallelism, batched GEMM, BSR GEMV and HYB GEMV. Our results show that (i) if a FETI solver contains a large number of small matrices i.e. there is large number of small subdomains, then the best approach is dynamic parallelism; (ii) if there is small number of large subdomains, then the optimal approaches are dynamic parallelism and CUDA streams. Please note that Local Schur Complement method in conjunction with Hybrid Total FETI perform better with smaller subdomains.
机译:在本文中,我们在GPU加速器上评估了在GPU加速器上执行大量矩阵的同步矩阵乘法的方法。该评估的目标是为我们的ESPRESO图书馆中的大规模平行杂交总索赔的技术开发有效的技术。纤维溶剂通常使用稀疏矩阵。为了克服这一点,我们以前提出了本地SCHUR补充方法,用于将稀疏矩阵转换为其密集表示,而不会显着提高GPU加速器的内存要求。我们选择了以下技术:标准Gemv,Cuda流,动态并行,批量宝石,BSR Gemv和Hyb Gemv。我们的结果表明,(i)如果FETI求解器包含大量的小矩阵,则有大量的小亚域,那么最好的方法是动态的并行性; (ii)如果大量的大域数量,则最佳方法是动态的并行性和CUDA流。请注意,本地SCUR补充方法与混合动力总量一起执行更好的子域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号