Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems

Ichitaro Yamazaki; Tingxing Dong; Raffaele Solca; Stanimire Tomov; Jack Dongarra; Thomas Schulthess

首页> 外文期刊>Concurrency, practice and experience >Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems

【24h】

Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems

机译：多GPU上的密集对称矩阵的三对角化及其在对称特征值问题中的应用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

For software to fully exploit the computing power of emerging heterogeneous computers, not only must thernrequired computational kernels be optimized for the specific hardware architectures but also an effectivernscheduling scheme is needed to utilize the available heterogeneous computational units and to hide the communicationrnbetween them. As a case study, we develop a static scheduling scheme for the tridiagonalizationrnof a symmetric dense matrix on multicore CPUs with multiple graphics processing units (GPUs) on a singlerncompute node.We then parallelize and optimize the Basic Linear Algebra Subroutines (BLAS)-2 symmetricrnmatrix-vector multiplication, and the BLAS-3 low rank symmetric matrix updates on the GPUs.We demonstraternthe good scalability of these multi-GPU BLAS kernels and the effectiveness of our scheduling schemernon twelve Intel Xeon processors and three NVIDIA GPUs. We then integrate our hybrid CPU-GPU kernelrninto computational kernels at higher-levels of software stacks, that is, a shared-memory dense eigensolverrnand a distributed-memory sparse eigensolver. Our experimental results show that our kernels greatly improvernthe performance of these higher-level kernels, not only reducing the solution time but also enabling the solutionrnof larger-scale problems. Because such symmetric eigenvalue problems arise in many scientific andrnengineering simulations, our kernels could potentially lead to new scientific discoveries. Furthermore, theserndense linear algebra algorithms present algorithmic characteristics that can be found in other algorithms.rnHence, they are not only important computational kernels on their own but also useful testbeds to study thernperformance of the emerging computers and the effects of the various optimization techniques.

机译：为了使软件能够充分利用新兴异构计算机的计算能力，不仅必须针对特定的硬件体系结构优化所需的计算内核，而且还需要一种有效的调度方案来利用可用的异构计算单元并隐藏它们之间的通信。作为案例研究，我们为在单个核算节点上具有多个图形处理单元（GPU）的多核CPU上的对称密集矩阵tridiagonalizationrno制定了静态调度方案，然后并行化和优化了基本线性代数子例程（BLAS）-2对称核矩阵向量乘法，以及GPU上的BLAS-3低秩对称矩阵更新。我们展示了这些多GPU BLAS内核的良好可扩展性以及我们的调度方案在12个Intel Xeon处理器和3个NVIDIA GPU上的有效性。然后，我们将混合CPU-GPU内核集成到更高级别的软件堆栈的计算内核中，即共享内存密集型本征求解器和分布式内存稀疏本征求解器。我们的实验结果表明，我们的内核大大提高了这些高级内核的性能，不仅减少了求解时间，而且还解决了更大规模的问题。由于这样的对称特征值问题出现在许多科学和工程仿真中，因此我们的内核有可能导致新的科学发现。此外，精巧的线性代数算法还具有其他算法可以找到的算法特征。因此，它们不仅是重要的计算内核，而且还是研究新兴计算机的性能以及各种优化技术的作用的有用试验床。

著录项

来源
《Concurrency, practice and experience》 |2014年第16期|2652-2666|共15页
作者
Ichitaro Yamazaki; Tingxing Dong; Raffaele Solca; Stanimire Tomov; Jack Dongarra; Thomas Schulthess;
展开▼
作者单位

Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, U.S.A.;

Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, U.S.A.;

Institute for Theoretical Physics and Swiss National Supercomputer Center, Eidgenoessische Technische Hochshule (ETH), Zuerich, Switzerland;

Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, U.S.A.;

Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, U.S.A.;

Institute for Theoretical Physics and Swiss National Supercomputer Center, Eidgenoessische Technische Hochshule (ETH), Zuerich, Switzerland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
dense linear algebra; GPU accelerators; symmetric tridiagonal reduction; symmetric matrixvector multiplication; parallel eigensolver;

机译：密线性代数GPU加速器;对称三对角线减少;对称矩阵矢量乘法并行特征求解器;

相似文献

外文文献
中文文献
专利

1. Derivative of a Determinant with Respect to an Eigenvalue in the Modified Cholesky Decomposition of a Symmetric Matrix, with Applications to Nonlinear Analysis [J] . Mitsuhiro Kashiwagi American Journal of Computational Mathematics . 2014,第2期

机译：关于对称矩阵的修正Cholesky分解中本征值的行列式导数及其在非线性分析中的应用
2. Symmetric schemes for computing the minimum eigenvalue of a symmetric Toeplitz matrix [J] . Voss Heinrich Linear Algebra and its Applications . 1999,第1a3期

机译：用于计算对称Toeplitz矩阵的最小特征值的对称方案
3. Solving dense symmetric indefinite systems using GPUs [J] . Marc Baboulin, Jack Dongarra, Adrien Rémy, Concurrency and computation: practice and experience . 2017,第9期

机译：使用GPU解决密集对称不定系统
4. Tridiagonalization of a Symmetric Dense Matrix on a GPU Cluster [C] . Yamazaki Ichitaro, Dong Tingxing, Tomov Stanimire, IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum . 2013

机译：GPU簇上对称密集矩阵的三对角化
5. Symmetric functions of the eigenvalues of a matrix [D] . Kulikauskas, Andrius Jonas 1993

机译：矩阵特征值的对称函数
6. Multiple-rank modification of symmetric eigenvalue problem [O] . HyungSeon Oh, Zhe Hu 2018

机译：对称特征值问题的多秩修正
7. Optimizing Symmetric Dense Matrix-Vector Multiplication on GPUs [O] . Rajib Nath, Stanimire Tomov, Tingxing ", 2011

机译：在GPU上优化对称密集矩阵-矢量乘法
8. Divide-and-conquer method for tridiagonalizing symmetric matrices with repeated eigenvalues [R] . Bischof, C. H. , Sun, X. 1994

机译：具有重复特征值的对称矩阵三对角化的分治方法

Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅