首页> 外文学位 >Efficient and portable parallel algorithms for Cholesky decomposition.
【24h】

Efficient and portable parallel algorithms for Cholesky decomposition.

机译:用于Cholesky分解的高效且可移植的并行算法。

获取原文
获取原文并翻译 | 示例

摘要

In this dissertation we consider design issues such as data, layout, data, dependency, communication scheduling, and parallel models in order to develop parallel algorithms for Cholesky decomposition. Our intention is to design efficient and portable parallel algorithms for large scale dense linear systems to be run in the distributed memory environment.; To consider the issue of portability, we utilize the well known LogP model for designing parallel algorithms and for performing theoretical run time analysis. We developed a contention free communication scheduling for the parallel algorithms in order to reduce the communication cost. To study the impact of data layout on performance, we designed the parallel algorithms based on column, row and block data distribution utilizing the fan-in and fan-out methods. Specifically we introduced panel size for 1D layouts and granularity size for 2D layouts for the task partitioning. To determine the efficiency of the parallel algorithms, we derived lower bound results and theoretical run time performance. The analysis on the parallel algorithms shows that parallel algorithms for 1D data layouts are optimal. However for 2D data layouts, the parallel algorithms are efficient, if not necessarily asymptotically optimal.; We then implemented the parallel algorithms using MPI. The experimental results show that fan-in outperforms fan-out for 2D layouts, especially for larger processor size. The performance for the 1D fan-out row-based parallel algorithm was no worse than that of the column-based parallel algorithms. This seems to refute the common notion that row-based parallel algorithms are not, as efficient as column-based parallel algorithms. The experimental results indicated that performance was impacted as panel size and/or granularity size varied. This provides insights that the constant factors for lower bound and run time performance call not be ignored.
机译:在本文中,我们考虑了诸如数据,布局,数据,依赖性,通信调度和并行模型之类的设计问题,以便开发用于Cholesky分解的并行算法。我们的目的是为在分布式存储环境中运行的大规模密集线性系统设计高效且可移植的并行算法。为了考虑可移植性问题,我们利用众所周知的LogP模型来设计并行算法并进行理论上的运行时分析。我们为并行算法开发了无竞争的通信调度,以降低通信成本。为了研究数据布局对性能的影响,我们使用扇入和扇出方法设计了基于列,行和块数据分布的并行算法。具体来说,我们为一维布局引入了面板尺寸,为任务分区引入了2D布局的粒度尺寸。为了确定并行算法的效率,我们导出了下限结果和理论运行时性能。对并行算法的分析表明,用于一维数据布局的并行算法是最佳的。但是,对于2D数据布局,并行算法是有效的,即使不一定渐近最优。然后,我们使用MPI实现了并行算法。实验结果表明,对于2D布局,扇入效果要优于扇出,尤其是对于较大的处理器尺寸。一维扇出基于行的并行算法的性能不比基于列的并行算法的性能差。这似乎驳斥了基于行的并行算法不如基于列的并行算法高效的普遍观念。实验结果表明,性能受面板尺寸和/或粒度尺寸变化的影响。这提供了见解,即下限和运行时性能调用的常量因素不会被忽略。

著录项

  • 作者

    Chu, Pei Yue Liu.;

  • 作者单位

    Lehigh University.;

  • 授予单位 Lehigh University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 123 p.
  • 总页数 123
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号