A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

Sukkari Dalal; Ltaief Hatem; Esposito Aniello; Keyes David

首页> 外文期刊>ACM transactions on mathematical software >A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

【24h】

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

机译：分布式内存Manycore系统上基于QDWH的SVD软件框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This article presents a high-performance software framework for computing a dense SVD on distributed-memory manycore systems. Originally introduced by Nakatsukasa et al. (2010) and Nakatsukasa and Higham (2013), the SVD solver relies on the polar decomposition using the QR Dynamically Weighted Halley algorithm (QDWH). Although the QDWH-based SVD algorithm performs a significant amount of extra floating-point operations compared to the traditional SVD with the one-stage bidiagonal reduction, the inherent high level of concurrency associated with Level 3 BLAS compute-bound kernels ultimately compensates for the arithmetic complexity overhead. Using the ScaLAPACK two-dimensional block cyclic data distribution with a rectangular processor topology, the resulting QDWH-SVD further reduces excessive communications during the panel factorization, while increasing the degree of parallelism during the update of the trailing submatrix, as opposed to relying on the default square processor grid. After detailing the algorithmic complexity and the memory footprint of the algorithm, we conduct a thorough performance analysis and study the impact of the grid topology on the performance by looking at the communication and computation profiling trade-WIN. We report performance results against state-of-the-art existing QDWH software implementations (e.g., Elemental) and their SVD extensions on large-scale distributed-memory manycore systems based on commodity Intel x86 Haswell processors and Knights Landing (KNL) architecture. The QDWH-SVD framework achieves up to 3/8-fold speedups on the Haswell/KNL-based platforms, respectively, against ScaLAPACK PDGESVD and turns out to be a competitive alternative for well- and ill-conditioned matrices. We finally come up herein with a performance model based on these empirical results. Our QDWH-based polar decomposition and its SVD extension are freely available at https://github.com/ecrc/qdwh.git and https://github.com/ecrc/ksvd.git, respectively, and have been integrated into the Cray Scientific numerical library LibSci v17.11.1.

机译：本文提出了一种高性能的软件框架，用于在分布式内存多核系统上计算密集的SVD。最初由Nakatsukasa等人介绍。（2010）和Nakatsukasa和Higham（2013），SVD求解器依赖于使用QR动态加权Halley算法（QDWH）的极坐标分解。尽管与传统的SVD相比，基于QDWH的SVD算法执行了一次阶段的对角线缩减，但它执行了大量的浮点运算，但是与级别3 BLAS计算绑定内核相关的固有高并发性最终弥补了该算法的不足。复杂性开销。与矩形处理器拓扑一起使用ScaLAPACK二维块循环数据分布，所得的QDWH-SVD进一步减少了面板分解期间的过多通信，同时在尾随子矩阵更新期间增加了并行度，这与依赖于默认的方形处理器网格。在详细介绍了算法的复杂性和算法的内存占用量之后，我们进行了全面的性能分析，并通过查看通信和计算配置文件trade-WIN来研究网格拓扑对性能的影响。我们根据现有的最新QDWH软件实现（例如Elemental）及其在基于商用Intel x86 Haswell处理器和Knights Landing（KNL）架构的大规模分布式内存多核系统上的SVD扩展报告了性能结果。 QDWH-SVD框架相对于ScaLAPACK PDGESVD，在基于Haswell / KNL的平台上分别实现了高达3/8倍的提速，并且证明它是条件良好和病态矩阵的竞争替代品。我们最终在此基于这些经验结果提出了一种绩效模型。我们基于QDWH的极坐标分解及其SVD扩展分别在https://github.com/ecrc/qdwh.git和https://github.com/ecrc/ksvd.git上免费提供，并已集成到Cray Scientific数值库LibSci v17.11.1。

著录项

来源
《ACM transactions on mathematical software》 |2019年第2期|18.1-18.21|共21页
作者
Sukkari Dalal; Ltaief Hatem; Esposito Aniello; Keyes David;
展开▼
作者单位

King Abdullah Univ Sci & Technol, Extreme Comp Res Ctr, Comp Elect & Math Sci & Engn CEMSE Div, Thuwal 23955, Saudi Arabia;

King Abdullah Univ Sci & Technol, Extreme Comp Res Ctr, Comp Elect & Math Sci & Engn CEMSE Div, Thuwal 23955, Saudi Arabia;

Cray Comp GmbH, Cray EMEA Res Lab CERL, Basel, Switzerland;

King Abdullah Univ Sci & Technol, Extreme Comp Res Ctr, Comp Elect & Math Sci & Engn CEMSE Div, Thuwal 23955, Saudi Arabia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Dense SVD solver; polar decomposition; QDWH; performance analysis; distributed-memory manycore systems;

机译：密集的SVD求解器;极性分解;QDWH;性能分析;分布式记忆多核系统;

相似文献

外文文献
中文文献
专利

1. A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems [J] . Sukkari Dalal, Ltaief Hatem, Esposito Aniello, ACM transactions on mathematical software . 2019,第2期

机译：基于QDWH的SVD软件框架上的分布式内存MDERCORE系统
2. Hardware-Software Collaborative Thermal Sensing in Optical Network-on-Chip-based Manycore Systems [J] . ACM Transactions on Embedded Computing Systems . 2019,第6期

机译：基于光网络的多芯系统中的硬件软件协作热敏
3. SAM: Software-Assisted Memory Hierarchy for Scalable Manycore Embedded Systems [J] . Majid Shoushtari, Nikil Dutt Embedded Systems Letters, IEEE . 2017,第4期

机译：SAM：可扩展的Manycore嵌入式系统的软件辅助内存层次结构
4. A Machine Learning Framework for Multi-Objective Design Space Exploration and Optimization of Manycore Systems [C] . Biresh Kumar Joardar, Aryan Deshwal, Janardhan Rao Doppa, ACM/IEEE Workshop on Machine Learning for CAD . 2019

机译：一个用于多目标设计空间探索和Manycore系统优化的机器学习框架
5. Software Assists to On-chip Memory Hierarchy of Manycore Embedded Systems [D] . Shoushtari, Abdolmajid Namaki. 2018

机译：该软件有助于Manycore嵌入式系统的片上存储器层次结构
6. StakeMeter: Value-Based Stakeholder Identification and Quantification Framework for Value-Based Software Systems [O] . Muhammad Imran Babar, Masitah Ghazali, Dayang N. A. Jawawi, -1

机译：StakeMeter：基于价值的软件系统的基于价值的利益相关者识别和量化框架
7. A QDWH-Based SVD Software Framework on Distributed-Memory Manycore Systems [O] . Sukkari Dalal, Ltaief Hatem, Esposito Aniello, 2017

机译：分布式内存Manycore系统上基于QDWH的SVD软件框架

A QDWH-based SVD Software Framework on Distributed-memory Manycore Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅