首页> 外文期刊>ACM Transactions on Parallel Computing >Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors
【24h】

Algorithms and Data Structures for Matrix-Free Finite Element Operators with MPI-Parallel Sparse Multi-Vectors

机译:用于MPI平行稀疏多向量的无矩阵有限元算子的算法和数据结构

获取原文
获取原文并翻译 | 示例

摘要

Traditional solution approaches for problems in quantum mechanics scale as O(M~3), where M is the number of electrons. Various methods have been proposed to address this issue and obtain a linear scaling O(M). One promising formulation is the direct minimization of energy. Such methods take advantage of physical localization of the solution, allowing users to seek it in terms of non-orthogonal orbitals with local support. This work proposes a numerically efficient implementation of sparse parallel vectors within the open-source finite element library deal. Ⅱ. The main algorithmic ingredient is the matrix-free evaluation of the Hamiltonian operator by cell-wise quadrature. Based on an a-priori chosen support for each vector, we develop algorithms and data structures to perform (ⅰ) matrix-free sparse matrix multivector products (SpMM), (ⅱ) the projection of an operator onto a sparse sub-space (inner products), and (ⅲ) post-multiplication of a sparse multivector with a square matrix. The node-level performance is analyzed using a roofline model. Our matrix-free implementation of finite element operators with sparse multivectors achieves a performance of 157 GFlop/s on an Intel Cascade Lake processor with 20 cores. Strong and weak scaling results are reported for a representative benchmark problem using quadratic and quartic finite element bases.
机译:传统的解决方案在量子力学标度中出现问题的方法,如O(m〜3),其中M是电子的数量。已经提出了各种方法来解决这个问题并获得线性缩放O(m)。一个有希望的配方是能量的直接最小化。此类方法利用解决方案的物理定位,允许用户在具有本地支持的非正交轨道方面寻求它。这项工作提出了在开源有限元图书馆交易中的数值有效地实现了稀疏平行向量。 Ⅱ。主要算法成分是通过细胞 - 明智的正交对哈密顿运营商的无矩阵评价。基于对每个载体的A-prioriE选择的支持,我们开发算法和数据结构,以执行(Ⅰ)矩阵稀疏矩阵多挡板产品(SPMM),(Ⅱ)操作员投影到稀疏的子空间上(内部产品),(Ⅲ)用方形矩阵的稀疏多移体的后倍增。使用屋顶模型分析节点级性能。我们的矩阵实施具有稀疏多电位器的有限元件运营商的有限元件运营商在英特尔级联湖泊处理器上实现了157个GFLOP / S的性能,具有20个核心。据报道,使用二次和四分之一有限元基部的代表性基准问题据报道了强大而弱的缩放结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号