首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Enhancing Scalability and Load Balancing of Parallel Selected Inversion via Tree-Based Asynchronous Communication
【24h】

Enhancing Scalability and Load Balancing of Parallel Selected Inversion via Tree-Based Asynchronous Communication

机译:通过基于树的异步通信增强并行所选反演的可扩展性和负载平衡

获取原文

摘要

We develop a method for improving the parallel scalability of computations that involve asynchronous task execution. We apply this method to the recently developed parallel selected inversion algorithm [Jacquelin, Lin and Yang 2014], named PSelInv, on massively parallel distributed memory machines. In the PSelInv method, we compute selected elements of the inverse of a sparse matrix A that can be decomposed as A = LU, where L is lower triangular and U is upper triangular. Computing these selected elements of A-1 requires restricted collective communications among a subset of processors within each column or row communication group created by a block cyclic distribution of L and U. We describe how this type of restricted collective communication can be implemented using asynchronous point-to-point MPI communications combined with a binary tree based data propagation scheme. Because multiple restricted collective communications may take place at the same time, we need to use a heuristic to prevent processors participating in multiple collective communications from receiving too many messages. This heuristic allows us to reduce communication load imbalance and improve the overall scalability of the selected inversion algorithm. For instance, when 6, 400 processors are used, we observe that the use of this heuristic leads to over 5x speedup for a number of test matrices. It also mitigates the performance variability introduced by an inhomogeneous network topology.
机译:我们开发一种用于提高涉及异步任务执行的计算的并行可扩展性的方法。我们将此方法应用于最近开发的并行选定的反演算法[JACQUELIN,LIN和2014],名为PSELINV,在大规模并行分布式存储器上。在PselinV方法中,我们计算可以被分解为= Lu的稀疏矩阵A的所选择的元素,其中L是较低的三角形,U是上三角形。计算A-1的这些所选元素需要由L和U的块循环分布创建的每个列或行通信组中的处理器子集之间的限制集体通信。我们描述了如何使用异步点来实现这种类型的受限集体通信 - 点MPI通信与基于二叉树的数据传播方案组合。由于多个受限的集体通信可能同时进行,因此我们需要使用启发式来防止参与多个集体通信的处理器从收到太多消息。这种启发式允许我们降低通信负载不平衡并提高所选反转算法的整体可扩展性。例如,当使用6,400处理器时,我们观察到这种启发式导致多个测试矩阵超过5倍的加速。它还减轻了不均匀的网络拓扑引入的性能变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号