首页> 外文期刊>IBM Journal of Research and Development >Recursion leads to automatic variable blocking for dense linear-algebra algorithms
【24h】

Recursion leads to automatic variable blocking for dense linear-algebra algorithms

机译:递归导致稠密线性代数算法的自动变量阻塞

获取原文
           

摘要

We describe some modifications of the LAPACK dense linear-algebra algorithms using recursion. Recursion leads to automatic variable blocking. LAPACK''s level-2 versions transform into level-3 codes by using recursion. The new recursive codes are written in FORTRAN 77, which does not support recursion as a language feature. Gaussian elimination with partial pivoting and Cholesky factorization are considered. Very clear algorithms emerge with the use of recursion. The recursive codes do exactly the same computation as the LAPACK codes, and a single recursive code replaces both the level-2 and level-3 versions of the corresponding LAPACK codes. We present an analysis of the recursive algorithm in terms of both FLOP count and storage usage. The matrix operands are more “squarish” using recursion. The total area of the submatrices used in the recursive algorithm is less than the total area used by the LAPACK level-3 right-/left-looking algorithms. We quantify the difference; we also quantify how the FLOPS are computed. Also, we show that the algorithms exhibit high-performance on RISC-type processors. In fact, except for small matrices, the recursive version outperforms the level-3 LAPACK versions of DGETRF and DPOTRF on an RS/6000™ workstation. For the level-2 versions, the performance gain approaches a factor of 3. We also demonstrate that a change to the LAPACK DLASWP routine can improve the performance of both the recursive version and DGETRF by more than 15 percent.
机译:我们描述了使用递归对LAPACK密集线性代数算法的一些修改。递归导致自动变量阻塞。 LAPACK的2级版本通过使用递归转换为3级代码。新的递归代码是用FORTRAN 77编写的,它不支持将递归作为一种语言功能。考虑了部分枢轴和Cholesky分解的高斯消去。通过使用递归,出现了非常清晰的算法。递归代码执行与LAPACK代码完全相同的计算,并且单个递归代码替换了相应LAPACK代码的2级和3级版本。我们根据FLOP计数和存储使用率对递归算法进行了分析。矩阵操作数使用递归更“夸张”。递归算法中使用的子矩阵的总面积小于LAPACK 3级左右视图算法使用的总面积。我们量化差异;我们还量化了FLOPS的计算方式。此外,我们证明了该算法在RISC型处理器上具有高性能。实际上,除了小型矩阵外,递归版本在RS / 6000™工作站上的性能优于DGETRF和DPOTRF的3级LAPACK版本。对于2级版本,性能提升接近3倍。我们还证明,对LAPACK DLASWP例程进行更改可以将递归版本和DGETRF的性能提高15%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号