...
首页> 外文期刊>Engineering with Computers >A parallel log barrier-based mesh warping algorithm for distributed memory machines
【24h】

A parallel log barrier-based mesh warping algorithm for distributed memory machines

机译:基于并行对数屏障的分布式存储机器网格变形算法

获取原文
获取原文并翻译 | 示例
           

摘要

Parallel dynamic meshes are essential for computational simulations of large-scale scientific applications involving motion. To address this need, we propose parallel LBWARP, a parallel log barrier-based tetrahedral mesh warping algorithm for distributed memory machines. Our algorithm is a general-purpose, geometric mesh warping algorithm that parallelizes the sequential LBWARP algorithm proposed by Shontz and Vavasis. The first step of the algorithm involves computation of a set of local weights for each interior node which describe the relative distances of the node to each of its neighbors. The weight computation step is the most time consuming in the parallel algorithm. Based on our choice of the mesh partition and the corresponding distribution of data and assignment of tasks to processors, communication among processors is avoided in an embarrassingly parallel computation of the weights. Once this representation of the initial mesh is determined, a target deformation of the boundary is applied, also in an embarrassingly parallel manner. Finally, new coordinates of the interior nodes are obtained by solving a system of linear equations with multiple right-hand sides that is based on the weights and boundary deformation. This linear system can be solved using one of three parallel sparse linear solvers, i.e., the distributed block BiCG, block GMRES, or LU algorithm, all of which support the solution of linear systems with multiple right-hand side vectors. Our numerical results demonstrate good efficiency and strong scalability of parallel LBWARP on up to 64 processors, as the experiments show close to linear speedup in all cases. Weak scalability is also demonstrated. The performance of the parallel sparse linear solvers is dependent on factors such as the mesh size, the amount of available memory, and the number of processors. For example, the distributed LU algorithm gives better performance on small meshes, whereas the distributed block BiCG and distributed block GMRES algorithms yield better performance when the amount of available memory is limited. Finally, we demonstrate the parallel LBWARP performance for a sequence of mesh deformations which can significantly reduce the runtime of the overall algorithm. When applied to k deformations, parallel LBWARP reuses the weight matrix, that was computed during the first deformation, when the distributed LU linear solver is employed. This gives close to k-time performance for sufficiently many deformations.
机译:并行动态网格对于涉及运动的大规模科学应用的计算仿真至关重要。为了满足这一需求,我们提出了并行LBWARP,这是一种用于分布式存储机器的基于对数屏障的并行四面体网格变形算法。我们的算法是通用的几何网格变形算法,可并行化Shontz和Vavasis提出的顺序LBWARP算法。该算法的第一步涉及为每个内部节点计算一组局部权重,这些局部权重描述了该节点到其每个邻居的相对距离。在并行算法中,权重计算步骤最耗时。基于我们对网格划分的选择以及数据的相应分配以及任务对处理器的分配,在尴尬的并行权重计算中避免了处理器之间的通信。一旦确定了初始网格的这种表示形式,就以尴尬的平行方式应用了边界的目标变形。最后,通过求解基于权重和边界变形的具有多个右侧的线性方程组,获得内部节点的新坐标。可以使用三个并行稀疏线性求解器之一(即分布式块BiCG,块GMRES或LU算法)来求解此线性系统,所有这些都支持具有多个右侧向量的线性系统的求解。我们的数值结果证明了在多达64个处理器上并行LBWARP的良好效率和强大可伸缩性,因为实验表明在所有情况下接近线性加速。还显示了可伸缩性较弱。并行稀疏线性求解器的性能取决于诸如网格大小,可用内存量和处理器数量之类的因素。例如,分布式LU算法在较小的网格上可提供更好的性能,而当可用内存量有限时,分布式块BiCG和分布式块GMRES算法可获得更好的性能。最后,我们证明了一系列网格变形的并行LBWARP性能,可以显着减少整个算法的运行时间。当应用于k个变形时,当使用分布式LU线性求解器时,并行LBWARP会重用权重矩阵,该权重矩阵是在第一次变形期间计算的。对于足够多的变形,这提供了接近k时间的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号