首页> 外文会议>International Conference on Parallel Processing and Applied Mathematics >Parallel One-Sided Jacobi SVD Algorithm with Variable Blocking Factor
【24h】

Parallel One-Sided Jacobi SVD Algorithm with Variable Blocking Factor

机译:具有可变阻塞因子的平行单侧Jacobi SVD算法

获取原文

摘要

Parallel one-sided block-Jacobi algorithm for the matrix singular value decomposition (SVD) requires an efficient computation of symmetric Gram matrices, their eigenvalue decompositions (EVDs) and an update of matrix columns and right singular vectors by matrix multiplication. In our recent parallel implementation with p processors and blocking factor l = 2p, these tasks are computed serially in each processor in a given parallel iteration step because each processor contains exactly two block columns of an input matrix A. However, as shown in our previous work, with increasing p (hence, with increasing blocking factor) the number of parallel iteration steps needed for the convergence of the whole algorithm increases linearly but faster than proportionally to p, so that it is hard to achieve a good speedup. We propose to break the tight relation l = 2p and to use a small blocking factor l = p/k for some integer k that divides p, l even. The algorithm then works with pairs of logical block columns that are distributed among processors so that all computations inside a parallel iteration step are themselves parallel. We discuss the optimal data distribution for parallel subproblems in the one-sided block-Jacobi algorithm and analyze its computational and communication complexity. Experimental results with full matrices of order 8192 show that our new algorithm with a small blocking factor is well scalable and can be 2-3 times faster than the ScaLAPACK procedure PDGESVD.
机译:用于矩阵奇异值分解(SVD)的并行单侧块 - Jacobi算法需要有效地计算对称克矩阵,其特征值分解(EVDS)和矩阵列和右奇异矢量的更新通过矩阵乘法。在我们最近的Parchedor和阻塞因子L = 2P的平行实现中,在给定的并行迭代步骤中串行计算这些任务,因为每个处理器包含输入矩阵A的两个块列。但是,如我们之前所示工作,随着P(因此,随着阻塞因子的增加),整个算法的收敛所需的并行迭代步骤的数量线性地增加,但比按比例更快地增加,因此很难实现良好的加速。我们建议打破紧密关系L = 2p,并为某些整数k除以p,l甚至的整数k的小阻挡因子l = p / k。然后,该算法与分布在处理器之间的逻辑块列配对,使得并行迭代步骤内的所有计算本身并行。我们讨论单面块 - Jacobi算法中并行子问题的最佳数据分布,并分析其计算和通信复杂性。具有订单8192的完整矩阵的实验结果表明,我们的新算法具有小的阻塞系数良好的可扩展性,并且可以比缩放方法PDGESVD快2-3倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号