A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

Sukkari Dalal; Ltaief Hatem; Esposito Aniello; Keyes David

首页> 外文期刊>ACM transactions on mathematical software >A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

【24h】

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

机译：基于QDWH的SVD软件框架上的分布式内存MDERCORE系统

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This article presents a high-performance software framework for computing a dense SVD on distributed-memory manycore systems. Originally introduced by Nakatsukasa et al. (2010) and Nakatsukasa and Higham (2013), the SVD solver relies on the polar decomposition using the QR Dynamically Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying on the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-WIN. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold speedups on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well- and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at https://github.com/ecrc/qdwh.git and https://github.com/ecrc/ksvd.git, respectively, and have been integrated into the Cray Scientific numerical library LibSci v17.11.1.

机译：本文介绍了一个高性能的软件框架，用于计算分布式内存多核系统上的密集SVD。最初由Nakatsukasa等人介绍。（2010）和Nakatsukasa和Higham（2013），SVD求解器依赖于使用QR动态加权的Halley算法（QDWh）的极性分解。虽然基于QDWH的SVD算法与传统SVD进行了大量的额外浮点操作，但与传统的SVD具有单级介绍缩减，与级别3 BLAS Compute核相关联的固有高水平并发最终补偿算术复杂的开销。使用具有矩形处理器拓扑的缩写二维块循环数据分布，所得到的QDWh-SVD进一步降低了面板分解过程中的过度通信，同时增加了尾随子藏可证期间的并行度，而不是依赖于依赖于默认方形处理器网格。在详细了解算法的算法复杂性和算法的存储空间之后，我们通过查看通信和计算分析贸易胜利，进行彻底的性能分析，研究网格拓扑结构对性能的影响。我们向基于商品英特尔X86哈尔韦尔处理器和骑士着陆（KNL）架构的大规模分布式内存MDERCORY系统（例如，元素）及其对大型分布式内存的SVD扩展来报告绩效结果。 QDWH-SVD框架分别在哈斯韦尔/ KNL的平台上实现了高达3/8倍的加速，而不是缩写PDGESVD，并成为竞争替代的矩阵。我们终于以基于这些经验结果的表现模型出现。我们基于QDWH的极性分解及其SVD扩展可在HTTPS://github.com/ecrc/qdwh.git和https://github.com/crc/ksvd.git上自由提供，并已集成到Cray Scientific Library LibSci v17.11.1。

著录项

来源
《ACM transactions on mathematical software》 |2019年第2期|18.1-18.21|共21页
作者
Sukkari Dalal; Ltaief Hatem; Esposito Aniello; Keyes David;
展开▼
作者单位

King Abdullah Univ Sci & Technol Extreme Comp Res Ctr Comp Elect & Math Sci & Engn CEMSE Div Thuwal 23955 Saudi Arabia;

King Abdullah Univ Sci & Technol Extreme Comp Res Ctr Comp Elect & Math Sci & Engn CEMSE Div Thuwal 23955 Saudi Arabia;

Cray Comp GmbH Cray EMEA Res Lab CERL Basel Switzerland;

King Abdullah Univ Sci & Technol Extreme Comp Res Ctr Comp Elect & Math Sci & Engn CEMSE Div Thuwal 23955 Saudi Arabia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Dense SVD solver; polar decomposition; QDWH; performance analysis; distributed-memory manycore systems;

机译：密集的SVD求解器;极性分解;QDWH;性能分析;分布式记忆多核系统;

相似文献

外文文献
中文文献
专利

1. A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems [J] . Sukkari Dalal, Ltaief Hatem, Esposito Aniello, ACM transactions on mathematical software . 2019,第2期

机译：分布式内存Manycore系统上基于QDWH的SVD软件框架
2. Hardware-Software Collaborative Thermal Sensing in Optical Network-on-Chip-based Manycore Systems [J] . ACM Transactions on Embedded Computing Systems . 2019,第6期

机译：基于光网络的多芯系统中的硬件软件协作热敏
3. SAM: Software-Assisted Memory Hierarchy for Scalable Manycore Embedded Systems [J] . Majid Shoushtari, Nikil Dutt Embedded Systems Letters, IEEE . 2017,第4期

机译：SAM：可扩展的Manycore嵌入式系统的软件辅助内存层次结构
4. A Machine Learning Framework for Multi-Objective Design Space Exploration and Optimization of Manycore Systems [C] . Biresh Kumar Joardar, Aryan Deshwal, Janardhan Rao Doppa, ACM/IEEE Workshop on Machine Learning for CAD . 2019

机译：一个用于多目标设计空间探索和Manycore系统优化的机器学习框架
5. Software Assists to On-chip Memory Hierarchy of Manycore Embedded Systems [D] . Shoushtari, Abdolmajid Namaki. 2018

机译：该软件有助于Manycore嵌入式系统的片上存储器层次结构
6. StakeMeter: Value-Based Stakeholder Identification and Quantification Framework for Value-Based Software Systems [O] . Muhammad Imran Babar, Masitah Ghazali, Dayang N. A. Jawawi, -1

机译：StakeMeter：基于价值的软件系统的基于价值的利益相关者识别和量化框架
7. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems [O] . Sukkari Dalal, Ltaief Hatem, Esposito Aniello, 2017

机译：分布式内存Manycore系统上基于QDWH的SVD软件框架

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅