Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures

Ltaief H.; Kurzak J.; Dongarra J.

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures

【24h】

Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures

机译：多核架构上并行的两面矩阵归约为带对角线形式

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The objective of this paper is to extend, in the context of multicore architectures, the concepts of tile algorithms [Buttari et al., 2007] for Cholesky, LU, and QR factorizations to the family of two-sided factorizations. In particular, the bidiagonal reduction of a general, dense matrix is very often used as a preprocessing step for calculating the Singular Value Decomposition. Furthermore, in the Top500 list of June 2008, 98 percent of the fastest parallel systems in the world were based on multicores. This confronts the scientific software community with both a daunting challenge and a unique opportunity. The challenge arises from the disturbing mismatch between the design of systems based on this new chip architecture-hundreds of thousands of nodes, a million or more cores, reduced bandwidth and memory available to cores-and the components of the traditional software stack, such as numerical libraries, on which scientific applications have relied for their accuracy and performance. The many-core trend has even more exacerbated the problem, and it becomes critical to efficiently integrate existing or new numerical linear algebra algorithms suitable for such hardware. By exploiting the concept of tile algorithms in the multicore environment (i.e., high level of parallelism with fine granularity and high-performance data representation combined with a dynamic data-driven execution), the band bidiagonal reduction presented here achieves 94 Gflop/s on a 12,000 ÃÂÃÂ 12,000 matrix with 16 Intel Tigerton 2.4 GHz processors. The main drawback of the tile algorithms approach for the bidiagonal reduction is that the full reduction cannot be obtained in one stage. Other methods have to be considered to further reduce the band matrix to the required form.

机译：本文的目的是在多核架构的背景下，将用于Cholesky，LU和QR分解的切片算法[Buttari等人，2007]的概念扩展到双面分解的族。特别地，通常将密集矩阵的双角形约简用作计算奇异值分解的预处理步骤。此外，在2008年6月的Top500榜单中，全球98％最快的并行系统是基于多核的。这使科学软件社区面临着艰巨的挑战和独特的机遇。挑战来自于基于这种新芯片架构的系统设计之间的令人不安的不匹配问题-数十万个节点，一百万或更多的内核，减少的带宽和内核可用的内存-以及传统软件堆栈的组件，例如数值库，其准确性和性能取决于科学应用。多核趋势甚至使问题更加严重，有效集成适用于此类硬件的现有或新的数值线性代数算法变得至关重要。通过在多核环境中利用图块算法的概念（即具有精细度的高并行度和高性能数据表示，以及动态数据驱动执行的结合），此处提出的能带双角减小可在单通道上达到94 Gflop / s。 12,000×12,000矩阵，带有16个Intel Tigerton 2.4 GHz处理器。用于对角线减小的瓦片算法方法的主要缺点是无法在一个阶段中获得完全减小。必须考虑采用其他方法将频带矩阵进一步缩小为所需形式。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2010年第4期|P.417-423|共7页
作者
Ltaief H.; Kurzak J.; Dongarra J.;
展开▼
作者单位

Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bidiagonal reduction; multicores.; singular value decomposition; tile algorithms;

机译：双对角约简;多核;奇异值分解;平铺算法;

相似文献

外文文献
中文文献
专利

1. High-Performance Bidiagonal Reduction using Tile Algorithms on Homogeneous Multicore Architectures [J] . HATEM LTAIEF, PIOTR LUSZCZEK, JACK DONGARRA ACM transactions on mathematical software . 2013,第3期

机译：均质多核体系结构上使用平铺算法的高性能双对角线化
2. Two-sided orthogonal reductions to condensed forms on asymmetric multicore processors [J] . Alonso Pedro, Catalan Sandra, Herrero Jose R., Parallel Computing . 2018,第octa期

机译：非对称多核处理器上的两侧正交压缩为压缩形式
3. Scheduling Two-Sided Transformations Using Tile Algorithms on Multicore Architectures [J] . HatemLtaief, JakubKurzak, JackDongarra, Scientific programming . 2010,第1期

机译：在多核体系结构上使用图块算法调度双向转换
4. Enhancing Parallelism of Tile Bidiagonal Transformation on Multicore Architectures Using Tree Reduction [C] . Hatem Ltaief, Piotr Luszczek, Jack Dongarra International conference on parallel processing and applied mathematics . 2012

机译：使用树约简增强多核架构上平铺双对角变换的并行性
5. Performance Optimization for Sparse Matrix Factorization Algorithms on Hybrid Multicore Architectures [D] . Tang, Meng. 2020

机译：混合多核架构上稀疏矩阵分解算法的性能优化
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures [O] . Hatem Ltaief, Jakub Kurzak, Jack Dongarra 2010

机译：多核架构上并行的两面矩阵归约为带对角线形式
8. Design of a parallel, dense linear algebra software library: Reduction to Hessenberg, tridiagonal, and bidiagonal form [R] . Choi, J., Dongarra, J. J., Walker, D. W. 1994

机译：并行，密集线性代数软件库的设计：简化为Hessenberg，三对角和双对角形式

Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅