首页> 外文期刊>IBM Journal of Research and Development >High- performance linear algebra algorithms using new generalized data structures for matrices
【24h】

High- performance linear algebra algorithms using new generalized data structures for matrices

机译:使用针对矩阵的新型广义数据结构的高性能线性代数算法

获取原文
获取原文并翻译 | 示例
           

摘要

We present a novel way to produce dense linear algebra factorization algorithms. The current state-of-the-art (SOA) dense linear algebra algorithms have a performance inefficiency, and thus they give suboptimal performance for most LAPACK factorizations. We show that using standard Fortran and C two-dimensional arrays is the main source of this inefficiency. For the other standard format (packed one-dimensional arrays for symmetric and/or triangular matrices), the situation is much worse. We show how to correct these performance inefficiencies by using new data structures (NDS) along with so-called kernel routines. The NDS generalize the current storage layouts for both standard formats. We use the concept of Equivalence and Elementary Matrices along with coordinate (linear) transformations to prove that our method works for an entire class of dense linear algebra algorithms. Also, we use the Algorithms and Architecture approach to explain why our new method gives higher efficiency. The simplest forms of the new factorization algorithms are a direct generalization of the commonly used LINPACK algorithms. On IBM platforms they can be generated from simple, textbook-type codes by the XLF Fortran compiler. On the IBM POWER3 processor, our implementation of Cholesky factorization achieves 92% of peak performance, whereas conventional SOA full-format LAPACK DPOTRF achieves 77% of peak performance. All programming for our NDS can be accomplished in standard Fortran through the use of three-and four-dimensional arrays. Thus, no new compiler support is necessary. Finally, we describe block hybrid formats (BHF). BHF allow one to use no additional storage over conventional (full and packed) matrix storage. This means that new algorithms based on BHF can be used as a backward-compatible replacement for LAPACK or LINPACK algorithms.
机译:我们提出了一种新颖的方法来产生密集的线性代数分解算法。当前的最新技术(SOA)密集线性代数算法的性能低下,因此对于大多数LAPACK分解,它们给出的性能都不理想。我们证明了使用标准的Fortran和C二维数组是这种效率低下的主要原因。对于其他标准格式(对称和/或三角形矩阵的打包一维数组),情况要糟得多。我们展示了如何通过使用新的数据结构(NDS)和所谓的内核例程来纠正这些性能低下的问题。 NDS概括了两种标准格式的当前存储布局。我们使用等价和初等矩阵的概念以及坐标(线性)变换来证明我们的方法适用于整个密集线性代数算法。另外,我们使用算法和体系结构方法来解释为什么我们的新方法具有更高的效率。新因子分解算法的最简单形式是常用LINPACK算法的直接概括。在IBM平台上,可以通过XLF Fortran编译器从简单的教科书类型代码生成它们。在IBM POWER3处理器上,我们实施的Cholesky分解实现了92%的峰值性能,而传统的SOA全格式LAPACK DPOTRF则实现了77%的峰值性能。通过使用三维数组,可以在标准Fortran中完成对NDS的所有编程。因此,不需要新的编译器支持。最后,我们描述了块混合格式(BHF)。 BHF允许人们不使用传统(完整和打包)矩阵存储之外的任何存储。这意味着基于BHF的新算法可以用作LAPACK或LINPACK算法的向后兼容替代。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号