首页> 外文期刊>Parallel Computing >Parallelization and scalability analysis of inverse factorization using the chunks and tasks programming model
【24h】

Parallelization and scalability analysis of inverse factorization using the chunks and tasks programming model

机译:使用块和任务编程模型进行逆分解的并行化和可伸缩性分析

获取原文
获取原文并翻译 | 示例

摘要

We present three methods for distributed memory parallel inverse factorization of block-sparse Hermitian positive definite matrices. The three methods are a recursive variant of the AINV inverse Cholesky algorithm, iterative refinement, and localized inverse factorization. All three methods are implemented using the Chunks and Tasks programming model, building on the distributed sparse quad-tree matrix representation and parallel matrix-matrix multiplication in the publicly available Chunks and Tasks Matrix Library (CHTML). Although the algorithms are generally applicable, this work was mainly motivated by the need for efficient and scalable inverse factorization of the basis set overlap matrix in large scale electronic structure calculations. We perform various computational tests on overlap matrices for quasi linear glutamic acid-alanine molecules and three-dimensional water clusters discretized using the standard Gaussian basis set STO-3G with up to more than 10 million basis functions. We show that for such matrices the computational cost increases only linearly with system size for all the three methods. We show both theoretically and in numerical experiments that the methods based on iterative refinement and localized inverse factorization outperform previous parallel implementations in weak scaling tests where the system size is increased in direct proportion to the number of processes. We show also that, compared to the method based on pure iterative refinement, the localized inverse factorization requires much less communication. (C) 2019 Elsevier B.V. All rights reserved.
机译:我们提出了三种用于块稀疏Hermitian正定矩阵的分布式内存并行逆分解的方法。这三种方法是AINV逆Cholesky算法的递归变量,迭代细化和局部逆因式分解。这三种方法都是使用“块和任务”编程模型来实现的,该模型基于可公开使用的块和任务矩阵库(CHTML)中的分布式稀疏四叉树矩阵表示形式和并行矩阵-矩阵乘法。尽管该算法通常适用,但这项工作的主要动机是在大规模电子结构计算中需要对基集重叠矩阵进行有效且可扩展的逆分解。我们对重叠线性矩阵的谷氨酸丙氨酸分子和使用标准高斯基集STO-3G离散化的三维水团进行了各种计算测试,该函数具有超过一千万个基函数。我们表明,对于这三种矩阵,这三种方法的计算成本仅随系统大小线性增加。我们在理论和数值实验中均表明,基于迭代细化和局部逆因式分解的方法在弱规模测试中的性能优于以前的并行实现,在弱规模测试中,系统大小与进程数成正比增加。我们还表明,与基于纯迭代细化的方法相比,局部逆因式分解需要更少的通信。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号