Regular distributions for storing dense matrices on parallel systems are not always used in practice. In many scientific applicati RUMMA) [1] to handle irregularly distributed matrices. Our approach relies on a distribution independent algorithm that provides dynamic load balancing by exploiting data locality and achieves performance as good as the traditional approach which relies on temporary arrays with regular distribution, data redistribution, and matrix multiplication for regular matrices to handle the irregular case. The proposed algorithm is memory-efficient because temporary matrices are not needed. This feature is critical for systems like the IBM Blue Gene/L that offer very limited amount of memory per node. The experimental results demonstrate very good performance across the range of matrix distributions and problem sizes motivated by real applications.
展开▼
机译:用于在并行系统上存储密集矩阵的常规分布并不总是在实践中使用。在许多科学应用程序中,[1]处理不规则分布的矩阵。我们的方法依赖于分布独立算法,通过利用数据局部性来提供动态负载平衡,并实现与诸如常规分布,数据再分布和矩阵乘法的临时阵列依赖矩阵来处理不规则情况的临时阵列的性能。所提出的算法是高效的,因为不需要临时矩阵。此功能对于像IBM Blue Gene / L这样的系统至关重要,为每个节点提供非常有限的内存量。实验结果表明,在真实应用程序的矩阵分布范围和问题尺寸范围内表现出非常好的性能。
展开▼