Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems

PAOLO DALBERTO; MARCO BODRATO; ALEXANDRU NICOLAU

首页> 外文期刊>ACM transactions on mathematical software >Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems

【24h】

Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems

机译：在对称多处理器系统的矩阵计算内核中利用并行性

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We present a simple and efficient methodology for the development, tuning, and installation of matrix algorithms such as the hybrid Strassen's and Winograd's fast matrix multiply or their combination with the 3M algorithm for complex matrices (I.e., hybrid: a recursive algorithm as Strassen's until a highly tuned BLAS matrix multiplication allows performance advantages). We investigate how modern Symmetric Multiprocessor (SMP) architectures present old and new challenges that can be addressed by the combination of an algorithm design with careful and natural parallelism exploitation at the function level (optimizations) such as function-call parallelism, function percolation, and function software pipelining.We have three contributions: first, we present a performance overview for double- and double-complex-precision matrices for state-of-the-art SMP systems; second, we introduce new algorithm implementations: a variant of the 3M algorithm and two new different schedules of Winograd's matrix multiplication (achieving up to 20% speedup with respect to regular matrix multiplication). About the latter Winograd's algorithms: one is designed to minimize the number of matrix additions and the other to minimize the computation latency of matrix additions; third, we apply software pipelining and threads allocation to all the algorithms and we show how this yields up to 10% further performance improvements.

机译：我们提供了一种简单有效的方法来开发，调整和安装矩阵算法，例如混合Strassen和Winograd的快速矩阵乘法或将它们与3M算法结合用于复杂矩阵（即，混合：递归算法，如Strassen的直到高度可调的BLAS矩阵乘法可提供性能优势）。我们研究了现代对称多处理器（SMP）架构如何通过将算法设计与在功能级别（优化）（例如，函数调用并行性，函数渗滤和功能软件流水线化。我们有三点贡献：首先，我们介绍了最新SMP系统的双精度和双精度矩阵的性能概述；其次，我们介绍新的算法实现：3M算法的一种变体和Winograd矩阵乘法的两个新时间表（相对于常规矩阵乘法，最高可提高20％）。关于后一种Winograd的算法：一种设计为最小化矩阵加法的数量，另一种设计为最小化矩阵加法的计算延迟。第三，我们将软件流水线和线程分配应用于所有算法，并说明如何将性能提高多达10％。

著录项

来源
《ACM transactions on mathematical software》 |2012年第1期|p.2.1-2.30|共30页
作者
PAOLO DALBERTO; MARCO BODRATO; ALEXANDRU NICOLAU;
展开▼
作者单位

Yahoo! Sunnivale, CA;

University of Rome II, Tor Vergata, Italy;

University of California at Irvine, CA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
matrix multiplications; fast algorithms; software pipeline; parallelism;

机译：矩阵乘法快速算法;软件管道;并行性;

相似文献

外文文献
中文文献
专利

1. The Cell Broadband Engine: Exploiting Multiple Levels of Parallelism in a Chip Multiprocessor [J] . Michael Gschwind International journal of parallel programming . 2007,第3期

机译：单元宽带引擎：在芯片多处理器中开发多个并行级别
2. A Multiprocessor Architecture SKY that Exploits Thread-Level Parallelism in Non-Numerical Applications [J] . Ryotaro Kobayashi, Yukihiro Ogawa, Mitsuaki Iwata 情報処理学会論文誌 . 2001,第2期

机译：在非数值应用程序中利用线程级并行性的多处理器体系结构SKY
3. The impact of exploiting instruction-level parallelism on shared-memory multiprocessors [J] . Pai V.S., Ranganathan P. IEEE Transactions on Computers . 1999,第2期

机译：利用指令级并行性对共享内存多处理器的影响
4. DAG Scheduling and Analysis on Multiprocessor Systems: Exploitation of Parallelism and Dependency [C] . Shuai Zhao, Xiaotian Dai, Iain Bate, IEEE Real-Time Systems Symposium . 2020

机译：多处理器系统的DAG调度与分析：平行和依赖的利用
5. Characterization and Exploitation of Nested Parallelism and Concurrent Kernel Execution to Accelerate High Performance Applications. [D] . Nina Paravecino, Fanny. 2017

机译：嵌套并行和并行内核执行的特性和开发，以加速高性能应用程序。
6. Implementing a Chaotic Cryptosystem by Performing Parallel Computing on Embedded Systems with Multiprocessors [O] . Abraham Flores-Vergara, Everardo Inzunza-González, Enrique Efren García-Guerrero, 2019

机译：通过在具有多处理器的嵌入式系统上执行并行计算来实现混沌密码系统
7. The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors [O] . Pai, Vijay S., Ranganathan, Parthasarathy, Abdel-Shafi, Hazim, 2002

机译：利用指令级并行性对共享内存多处理器的影响

Exploiting Parallelism in Matrix-Computation Kernels for Symmetric Multiprocessor Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅