首页> 外文期刊>ACM transactions on mathematical software >Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems
【24h】

Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems

机译:在对称多处理器系统的矩阵计算内核中利用并行性

获取原文
获取原文并翻译 | 示例

摘要

We present a simple and efficient methodology for the development, tuning, and installation of matrix algorithms such as the hybrid Strassen's and Winograd's fast matrix multiply or their combination with the 3M algorithm for complex matrices (I.e., hybrid: a recursive algorithm as Strassen's until a highly tuned BLAS matrix multiplication allows performance advantages). We investigate how modern Symmetric Multiprocessor (SMP) architectures present old and new challenges that can be addressed by the combination of an algorithm design with careful and natural parallelism exploitation at the function level (optimizations) such as function-call parallelism, function percolation, and function software pipelining.We have three contributions: first, we present a performance overview for double- and double-complex-precision matrices for state-of-the-art SMP systems; second, we introduce new algorithm implementations: a variant of the 3M algorithm and two new different schedules of Winograd's matrix multiplication (achieving up to 20% speedup with respect to regular matrix multiplication). About the latter Winograd's algorithms: one is designed to minimize the number of matrix additions and the other to minimize the computation latency of matrix additions; third, we apply software pipelining and threads allocation to all the algorithms and we show how this yields up to 10% further performance improvements.
机译:我们提供了一种简单有效的方法来开发,调整和安装矩阵算法,例如混合Strassen和Winograd的快速矩阵乘法或将它们与3M算法结合用于复杂矩阵(即,混合:递归算法,如Strassen的直到高度可调的BLAS矩阵乘法可提供性能优势)。我们研究了现代对称多处理器(SMP)架构如何通过将算法设计与在功能级别(优化)(例如,函数调用并行性,函数渗滤和功能软件流水线化。我们有三点贡献:首先,我们介绍了最新SMP系统的双精度和双精度矩阵的性能概述;其次,我们介绍新的算法实现:3M算法的一种变体和Winograd矩阵乘法的两个新时间表(相对于常规矩阵乘法,最高可提高20%)。关于后一种Winograd的算法:一种设计为最小化矩阵加法的数量,另一种设计为最小化矩阵加法的计算延迟。第三,我们将软件流水线和线程分配应用于所有算法,并说明如何将性能提高多达10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号