首页> 外文OA文献 >A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems
【2h】

A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems

机译:分布式内存Manycore系统上基于QDWH的SVD软件框架

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper presents a high performance software framework for computing a dense SVD on distributed- memory manycore systems. Originally introduced by Nakatsukasa et al. (Nakatsukasa et al. 2010; Nakatsukasa and Higham 2013), the SVD solver relies on the polar decomposition using the QR Dynamically-Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying to the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-offs. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at https://github.com/ecrc/qdwh.git and https://github.com/ecrc/ksvd.git, respectively, and have been integrated into the Cray Scientific numerical library LibSci v17.11.1.
机译:本文提出了一种高性能的软件框架,用于在分布式内存多核系统上计算密集的SVD。最初由Nakatsukasa等人介绍。 (Nakatsukasa等人,2010年; Nakatsukasa和Higham,2013年),SVD求解器依赖于使用QR动态加权Halley算法(QDWH)的极坐标分解。尽管与传统的SVD相比,基于QDWH的SVD算法执行了一次阶段的对角线缩减,但它执行了大量的浮点运算,但是与级别3 BLAS计算绑定内核相关的固有高并发性最终弥补了该算法的不足。复杂性开销。使用具有矩形处理器拓扑结构的ScaLAPACK二维块循环数据分布,生成的QDWH-SVD进一步减少了面板分解期间的过多通信,同时增加了尾部子矩阵更新期间的并行度,这与依赖于默认的方形处理器网格。在详细介绍了算法的复杂性和算法的内存占用量之后,我们进行了全面的性能分析,并通过查看通信和计算性能折衷来研究网格拓扑对性能的影响。我们根据现有的最新QDWH软件实现(例如Elemental)及其在基于商用Intel x86 Haswell处理器和Knights Landing(KNL)架构的大规模分布式内存多核系统上的SVD扩展报告了性能结果。在基于Haswell / KNL的平台上,与ScaLAPACK PDGESVD相比,QDWH-SVD框架的性能最高可提高3/8倍,并且是对条件良好和病态基质的有竞争力的替代方案。我们最终在此基于这些经验结果提出了一种绩效模型。我们基于QDWH的极坐标分解及其SVD扩展分别在https://github.com/ecrc/qdwh.git和https://github.com/ecrc/ksvd.git上免费提供,并已集成到Cray Scientific数值库LibSci v17.11.1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号