A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

Vinod Valsalam; Anthony Skjellum

首页> 外文期刊>Concurrency and Computation >A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

【24h】

A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

机译：基于分层抽象，算法和优化的低级内核的高性能矩阵乘法框架

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Despite extensive research, optimal performance has not easily been available previously for matrix multiplication (especially for large matrices) on most architectures because of the lack of a structured approach and the limitations imposed by matrix storage formats. A simple but effective framework is presented here that lays the foundation for building high-performance matrix-multiplication codes in a structured, portable and efficient manner. The resulting codes are validated on three different representative RISC and CISC architectures on which they significantly outperform highly/optimized libraries such as ATLAS and other competing methodologies reported in the literature. The main component of the proposed approach is a hierarchical storage format that efficiently generalizes the applicability of the memory hierarchy friendly Morton ordering to arbitrary-sized matrices. The storage format supports polyalgorithms, which are shown here to be essential for obtaining the best possible performance for a range of problem sizes. Several algorithmic advances are made in this paper, including an oscillating iterative algorithm for matrix multiplication and a variable recursion cutoff criterion for Strassen's algorithm. The authors expose the need to standardize linear algebra kernel interfaces, distinct from the BLAS, for writing portable high-performance code. These kernel routines operate on small blocks that fit in the L1 cache. The performance advantages of the proposed framework can be effectively delivered to new and existing applications through the use of object-oriented or compiler-based approaches.

机译：尽管进行了广泛的研究，但是由于缺乏结构化的方法以及矩阵存储格式的局限性，大多数架构上的矩阵乘法（尤其是大型矩阵）以前并不容易获得最佳性能。本文介绍了一个简单而有效的框架，该框架为以结构化，可移植且高效的方式构建高性能矩阵乘法代码奠定了基础。所生成的代码在三种不同的代表性RISC和CISC体系结构上得到了验证，在这些体系结构上，它们明显优于高度/优化的库，例如ATLAS和文献中报道的其他竞争方法。所提出的方法的主要组成部分是一种分层存储格式，可以有效地将对存储器分层友好的Morton排序的适用性概括为任意大小的矩阵。存储格式支持多元算法，此处显示的多元算法对于在一系列问题大小中获得最佳性能至关重要。本文在算法上取得了一些进展，包括用于矩阵乘法的振荡迭代算法和用于Strassen算法的可变递归截止准则。作者提出了标准化线性代数内核接口（不同于BLAS）以编写可移植的高性能代码的需求。这些内核例程在适合L1缓存的小块上运行。通过使用面向对象或基于编译器的方法，可以将所提出框架的性能优势有效地传递给新的和现有的应用程序。

著录项

来源
《Concurrency and Computation》 |2002年第10期|p.805-839|共35页
作者
Vinod Valsalam; Anthony Skjellum;
展开▼
作者单位

High Performance Computing Laboratory, Department of Computer Science, Mississippi State University, MS 39762, U.S.A.;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
matrix multiplication; hierarchical matrix storage; morton order; polyalgorithms; strassen's algorithm; kernel interface;

机译：矩阵乘法分层矩阵存储Morton阶多重算法strassen算法内核接口;

相似文献

外文文献
中文文献
专利

1. A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels [J] . Vinod Valsalam, Anthony Skjellum Concurrency and Computation . 2002,第10期

机译：基于分层抽象，算法和优化的低级内核的高性能矩阵乘法框架
2. A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels [J] . Vinod Valsalam, Anthony Skjellum Concurrency and Computation . 2002,第10期

机译：基于层次抽象，算法和优化的低级内核的高性能矩阵乘法框架
3. A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels [J] . Vinod Valsalam, Anthony Skjellum Concurrency and Computation . 2002,第10期

机译：基于层次抽象，算法和优化的低级内核的高性能矩阵乘法框架
4. A Dominance-Based Constrained Optimization Evolutionary Algorithm for the 4-th Tensor Power Problem of Matrix Multiplication [C] . Langping Tang, Yuren Zhou, Zefeng Chen International conference on cloud computing and security . 2018

机译：矩阵乘法的四张量幂问题的基于优势的约束优化进化算法
5. Hypergraph-based combinatorial optimization of matrix -vector multiplication [D] . Wolf, Michael Maclean 2009

机译：基于超图的矩阵向量乘法组合优化
6. Hierarchical Lignin-Based Carbon Matrix and CarbonDot Composite Electrodes for High-Performance Supercapacitors [O] . Lu Yu, Chien-Te Hsieh, *, 2021

机译：基于分层木质素的碳基质和碳用于高性能超级电容器的点复合电极
7. Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices [O] . I. Masliah, A. Abdelfattah, A. Haidar, 2019

机译：非常小矩阵高性能矩阵矩阵乘法的算法和优化技术

A framework for high-performance matrix multiplication based on hierarchical abstractions, algorithms and optimized low-level kernels

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅