QR factorization of tall and skinny matrices in a grid computing environment

机译：电网计算环境中高瘦矩阵的QR分解

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Previous studies have reported that common dense linear algebra operations do not achieve speed up by using multiple geographical sites of a computational grid. Because such operations are the building blocks of most scientific applications, conventional supercomputers are still strongly predominant in high-performance computing and the use of grids for speeding up large-scale scientific problems is limited to applications exhibiting parallelism at a higher level. We have identified two performance bottlenecks in the distributed memory algorithms implemented in ScaLAPACK, a state-of-the-art dense linear algebra library. First, because ScaLA-PACK assumes a homogeneous communication network, the implementations of ScaLAPACK algorithms lack locality in their communication pattern. Second, the number of messages sent in the ScaLAPACK algorithms is significantly greater than other algorithms that trade flops for communication. In this paper, we present a new approach for computing a QR factorization - one of the main dense linear algebra kernels - of tall and skinny matrices in a grid computing environment that overcomes these two bottlenecks. Our contribution is to articulate a recently proposed algorithm (Communication Avoiding QR) with a topology-aware middleware (QCG-OMPI) in order to confine intensive communications (ScaLAPACK calls) within the different geographical sites. An experimental study conducted on the Grid'5000 platform shows that the resulting performance increases linearly with the number of geographical sites on large-scale problems (and is in particular consistently higher than ScaLAPACK's).

机译：以前的研究报告说，通过使用计算网格的多个地理位置，常见的密集线性代数操作不会达到速度。因为这种操作是大多数科学应用的构建块，所以传统的超级计算机在高性能计算中仍然强烈占主导地位，并且使用电网加速大规模的科学问题的使用仅限于在更高水平上表现出并行性的应用。我们已经确定了在ScalaCack中实现的分布式内存算法中的两个性能瓶颈，是最先进的密集线性代数库库。首先，因为Scala-Pack采用均匀通信网络，所以Scalapack算法的实现缺少其通信模式的局部性。其次，在Scalapack算法中发送的消息的数量明显大于交易跨越通信的其他算法。在本文中，我们提出了一种用于计算QR分解的新方法 - 在网格计算环境中克服这两个瓶颈的网格计算环境中的高和瘦矩阵的主要致密线性代数核之一。我们的贡献是阐明最近提出的算法（通信避免QR），其中拓扑知识的中间件（QCG-OMPI），以限制不同地理位置内的密集通信（ScalaCack呼叫）。在Grid'5000平台上进行的实验研究表明，由大规模问题的地理位置数量线性地增加了所得性能（并且特别始终高于ScalaCack）。

著录项

来源
《IEEE International Symposium on Parallel Distributed Processing》|2010年||共11页
会议地点
作者
Agullo E.; Coti C.; Dongarra J.; Herault T.; Langem J.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.138-53;
关键词

相似文献

外文文献
中文文献
专利

1. Tall-and-skinny QR factorization with approximate Householder reflectors on graphics processors [J] . Tomas Andres E., Quintana-Orti Enrique S. Journal of supercomputing . 2020,第11期

机译：高度瘦的QR QR分解，具有在图形处理器上的近似家庭式反射器
2. SHIFTED CHOLESKY QR FOR COMPUTING THE QR FACTORIZATION OF ILL-CONDITIONED MATRICES [J] . Fukaya Takeshi, Kannan Ramaseshan, Nakatsukasa Yuji, SIAM Journal on Scientific Computing . 2020,第1期

机译：转移Cholesky QR计算不良矩阵的QR分解
3. Computing Approximate Fekete Points By Qr Factorizations Of Vandermonde Matrices [J] . Alvise Sommariva, Marco Vianello Computers & mathematics with applications . 2009,第8期

机译：通过范德蒙德矩阵的Qr分解计算近似Fekete点
4. QR factorization of tall and skinny matrices in a grid computing environment [C] . Agullo Emmanuel, Coti Camille, Dongarra Jack, 2010 IEEE International Symposium on Parallel amp; Distributed Processing (IPDPS) . 2010

机译：网格计算环境中高瘦矩阵的QR分解
5. Stable Sparse Orthogonal Factorization of Ill-Conditioned Banded Matrices for Parallel Computing [D] . Huang, Qian. 2017

机译：并行计算的病态带状矩阵的稳定稀疏正交分解
6. Computing eigenvectors of block tridiagonal matrices based on twisted block factorizations [O] . Gerhard König, Michael Moldaschl, Wilfried N. Gansterer -1

机译：基于扭曲块分解的块三对角矩阵特征向量的计算
7. QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment [O] . Agullo, Emmanuel, Coti, Camille, Dongarra, Jack, 2010

机译：网格计算环境中高瘦矩阵的QR分解

QR factorization of tall and skinny matrices in a grid computing environment

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅