首页> 外文期刊>ACM transactions on mathematical software >A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems
【24h】

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

机译:基于QDWH的SVD软件框架上的分布式内存MDERCORE系统

获取原文
获取原文并翻译 | 示例
           

摘要

This article presents a high-performance software framework for computing a dense SVD on distributed-memory manycore systems. Originally introduced by Nakatsukasa et al. (2010) and Nakatsukasa and Higham (2013), the SVD solver relies on the polar decomposition using the QR Dynamically Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying on the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-WIN. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold speedups on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well- and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at https://github.com/ecrc/qdwh.git and https://github.com/ecrc/ksvd.git, respectively, and have been integrated into the Cray Scientific numerical library LibSci v17.11.1.
机译:本文介绍了一个高性能的软件框架,用于计算分布式内存多核系统上的密集SVD。最初由Nakatsukasa等人介绍。 (2010)和Nakatsukasa和Higham(2013),SVD求解器依赖于使用QR动态加权的Halley算法(QDWh)的极性分解。虽然基于QDWH的SVD算法与传统SVD进行了大量的额外浮点操作,但与传统的SVD具有单级介绍缩减,与级别3 BLAS Compute核相关联的固有高水平并发最终补偿算术复杂的开销。使用具有矩形处理器拓扑的缩写二维块循环数据分布,所得到的QDWh-SVD进一步降低了面板分解过程中的过度通信,同时增加了尾随子藏可证期间的并行度,而不是依赖于依赖于默认方形处理器网格。在详细了解算法的算法复杂性和算法的存储空间之后,我们通过查看通信和计算分析贸易胜利,进行彻底的性能分析,研究网格拓扑结构对性能的影响。我们向基于商品英特尔X86哈尔韦尔处理器和骑士着陆(KNL)架构的大规模分布式内存MDERCORY系统(例如,元素)及其对大型分布式内存的SVD扩展来报告绩效结果。 QDWH-SVD框架分别在哈斯韦尔/ KNL的平台上实现了高达3/8倍的加速,而不是缩写PDGESVD,并成为竞争替代的矩阵。我们终于以基于这些经验结果的表现模型出现。我们基于QDWH的极性分解及其SVD扩展可在HTTPS://github.com/ecrc/qdwh.git和https://github.com/crc/ksvd.git上自由提供,并已集成到Cray Scientific Library LibSci v17.11.1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号