Minimal Data Copy for Dense Linear Algebra Factorization

机译：密集线性代数分解的最小数据复制

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The full format data structures of Dense Linear Algebra hurt the performance of its factorization algorithms. Full format rectangular matrices are the input and output of level the 3 BLAS. It follows that the LAPACK and Level 3 BLAS approach has a basic performance flaw. We describe a new result that shows that representing a matrix A as a collection of square blocks will reduce the amount of data reformating required by dense linear algebra factorization algorithms from O(n~3) to O(n~2). On an IBM Power3 processor our implementation of Cholesky factorization achieves 92% of peak performance whereas conventional full format LAPACK dpotrf achieves 77% of peak performance. All programming for our new data structures may be accomplished in standard Fortran, through the use of higher dimensional full format arrays. Thus, new compiler support may not be necessary. We also discuss the role of concatenating submatrices to facilitate hardware streaming. Finally, we discuss a new concept which we call the L1 / L0 cache interface.

机译：密集线性代数的全格式数据结构损害了其分解算法的性能。全格式矩形矩阵是3 BLAS级别的输入和输出。因此，LAPACK和3级BLAS方法具有基本的性能缺陷。我们描述了一个新的结果，该结果表明将矩阵A表示为正方形块的集合将减少密集线性代数分解算法所需的数据重整量，从O（n〜3）到O（n〜2）。在IBM Power3处理器上，我们对Cholesky因数分解的实现实现了92％的峰值性能，而传统的全格式LAPACK dpotrf则实现了77％的峰值性能。通过使用更高维度的完整格式数组，可以在标准Fortran中完成对我们新数据结构的所有编程。因此，可能不需要新的编译器支持。我们还将讨论级联子矩阵以促进硬件流传输的作用。最后，我们讨论一个称为L1 / L0缓存接口的新概念。

著录项

来源
《International Workshop on Applied Parallel Computing: State of the Art in Scientific Computing(PARA 2006); 20060618-21; Umea(SE)》|2006年|P.540-549|共10页
会议地点 Umea(SE)
作者
Fred G. Gustavson; John A. Gunnels; James C. Sexton;
展开▼
作者单位

IBM T. J. Watson Research Center, Yorktown Heights, NY 10598, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类分布式操作系统、并行式操作系统;
关键词

相似文献

外文文献
中文文献
专利

1. Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers [J] . Mathieu Faverge, Julien Herrmann, Julien Langou, Journal of Parallel and Distributed Computing . 2015,第NOVa期

机译：混合LU和QR分解算法以设计高性能的密集线性代数求解器
2. Factorization identities and algebraic Bethe ansatz for D 2 2 documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$ {D}_2^{(2)} $$end{document} models [J] . Rafael I. Nepomechie, Ana L. Retore The journal of high energy physics . 2021,第3期

机译：用于<内联公式ID =“IEQ1”> <替代方案> D 2 2 documentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} usepackage {amsbsy} usepackage { mathrsfs} usepackage {supmeez} setLength { oddsidemargin} { - 69pt} begin {document} $$ {d} _2 ^ {（2）} $$ end {document} <内联 - 图形XLink：HREF =“13130_2021_15048_ARTICLICLE_IEQ1.gif”/> Models
3. Factorization in Weighted Wiener Matrix Algebras on Linearly Ordered Abelian Groups [J] . Torsten Ehrhardt, Cornelis van der Mee, Leiba Rodman, Integral Equations and Operator Theory . 2007,第1期

机译：线性序阿贝尔群上加权维纳矩阵代数的因式分解
4. Minimal Data Copy for Dense Linear Algebra Factorization [C] . Fred G. Gustavson, John A. Gunnels, James C. Sexton International Workshop on Applied Parallel Computing . 2007

机译：密集线性代数分解的最小数据复制
5. Reducing Data Movement Energy on Dense and Sparse Linear Algebra Workloads: From Machine Learning to High Performance Scientific Computing [D] . Feinberg, Ben. 2019

机译：减少密集和稀疏线性代数工作负载上的数据移动能量：从机器学习到高性能科学计算
6. Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets [O] . César R. García-Jacas, Ernesto Contreras-Torres, Yovani Marrero-Ponce, 2016

机译：在基准数据集上检验新型3D N线性代数分子编码的预测准确性
7. Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers [O] . Faverge, Mathieu, Herrmann, Julien, Langou, Julien, 2015

机译：混合LU和QR分解算法以设计高性能的密集线性代数求解器
8. Algebraic Specifications for Parameterized Data Types: The Case of Minimal Computable Algebras and Parameters with Equality [R] . Rodenburg, P. H. 1987

机译：参数化数据类型的代数规范：最小可计算代数和具有等式的参数的情形

Minimal Data Copy for Dense Linear Algebra Factorization

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅