Efficient and portable parallel algorithms for Cholesky decomposition.

机译：用于Cholesky分解的高效且可移植的并行算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this dissertation we consider design issues such as data, layout, data, dependency, communication scheduling, and parallel models in order to develop parallel algorithms for Cholesky decomposition. Our intention is to design efficient and portable parallel algorithms for large scale dense linear systems to be run in the distributed memory environment.; To consider the issue of portability, we utilize the well known LogP model for designing parallel algorithms and for performing theoretical run time analysis. We developed a contention free communication scheduling for the parallel algorithms in order to reduce the communication cost. To study the impact of data layout on performance, we designed the parallel algorithms based on column, row and block data distribution utilizing the fan-in and fan-out methods. Specifically we introduced panel size for 1D layouts and granularity size for 2D layouts for the task partitioning. To determine the efficiency of the parallel algorithms, we derived lower bound results and theoretical run time performance. The analysis on the parallel algorithms shows that parallel algorithms for 1D data layouts are optimal. However for 2D data layouts, the parallel algorithms are efficient, if not necessarily asymptotically optimal.; We then implemented the parallel algorithms using MPI. The experimental results show that fan-in outperforms fan-out for 2D layouts, especially for larger processor size. The performance for the 1D fan-out row-based parallel algorithm was no worse than that of the column-based parallel algorithms. This seems to refute the common notion that row-based parallel algorithms are not, as efficient as column-based parallel algorithms. The experimental results indicated that performance was impacted as panel size and/or granularity size varied. This provides insights that the constant factors for lower bound and run time performance call not be ignored.

机译：在本文中，我们考虑了诸如数据，布局，数据，依赖性，通信调度和并行模型之类的设计问题，以便开发用于Cholesky分解的并行算法。我们的目的是为在分布式存储环境中运行的大规模密集线性系统设计高效且可移植的并行算法。为了考虑可移植性问题，我们利用众所周知的LogP模型来设计并行算法并进行理论上的运行时分析。我们为并行算法开发了无竞争的通信调度，以降低通信成本。为了研究数据布局对性能的影响，我们使用扇入和扇出方法设计了基于列，行和块数据分布的并行算法。具体来说，我们为一维布局引入了面板尺寸，为任务分区引入了2D布局的粒度尺寸。为了确定并行算法的效率，我们导出了下限结果和理论运行时性能。对并行算法的分析表明，用于一维数据布局的并行算法是最佳的。但是，对于2D数据布局，并行算法是有效的，即使不一定渐近最优。然后，我们使用MPI实现了并行算法。实验结果表明，对于2D布局，扇入效果要优于扇出，尤其是对于较大的处理器尺寸。一维扇出基于行的并行算法的性能不比基于列的并行算法的性能差。这似乎驳斥了基于行的并行算法不如基于列的并行算法高效的普遍观念。实验结果表明，性能受面板尺寸和/或粒度尺寸变化的影响。这提供了见解，即下限和运行时性能调用的常量因素不会被忽略。

著录项

作者
Chu, Pei Yue Liu.;
展开▼
作者单位

Lehigh University.;

展开▼
授予单位 Lehigh University.;
学科 Computer Science.
学位 Ph.D.
年度 2003
页码 123 p.
总页数 123
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Explicit parallel block Cholesky algorithms on the CRAY APP [J] . Margreet Nool Applied Numerical Mathematics . 1995,第1a2期

机译：CRAY APP上的显式并行块Cholesky算法
2. A Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm [J] . BJARNE S. ANDERSEN, JOHN A. GUNNELS, FRED G. GUSTAVSON, ACM transactions on mathematical software . 2005,第2期

机译：完全便携式的高性能最小存储混合格式Cholesky算法
3. A parallel Cholesky algorithm for the solution of symmetriclinear systems [J] . R. R.Khazal, M. M.Chawla International journal of mathematics and mathematical sciences . 2004,第25期

机译：求解对称线性系统的并行Cholesky算法
4. Sequential and Parallel Algorithms for Cholesky Factorization of Sparse Matrices [C] . Nerma Bascelija International Conference on Applied and Theoretical Mechanics . 2013

机译：稀疏矩阵凿弦分解的顺序和并行算法
5. Communication-efficient parallel sparse Cholesky factorization. [D] . Eswar, Kalluri. 1995

机译：具有通信效率的并行稀疏Cholesky分解。
6. Highly efficient and exact method for parallelization of grid-based algorithms and its implementation in DelPhi [O] . Chuan Li, Lin Li, Jie Zhang, -1

机译：基于网格的算法并行化的高效和精确方法及其在Delphi中的实现
7. Level-3 Cholesky Kernel Subroutine of a Fully Portable High Performance Minimal Storage Hybrid Format Cholesky Algorithm [O] . Gustavson Fred G., Wasniewski Jerzy, Dongarra Jack J. 2008

机译：完全便携式高性能最小存储混合格式Cholesky算法的Level-3 Cholesky内核子程序
8. Parallelization of a spherical S(sub N) algorithm based on the spatial domain decomposition. [R] . Haghighat, A., Azmy, Y. Y. 1991

机译：基于空间域分解的球形s（sub N）算法的并行化。

Efficient and portable parallel algorithms for Cholesky decomposition.

摘要

著录项

相似文献

相关主题

期刊订阅