Optimizing Memory Bandwidth Use and Performance for Matrix-Vector Multiplication in Iterative Methods

DAVID BOLAND; GEORGE A. CONSTANTINIDES

首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Optimizing Memory Bandwidth Use and Performance for Matrix-Vector Multiplication in Iterative Methods

【24h】

Optimizing Memory Bandwidth Use and Performance for Matrix-Vector Multiplication in Iterative Methods

机译：通过迭代方法优化矩阵向量乘法的内存带宽使用和性能

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [Morris et al. 2006; Zhang et al. 2008; Zhuo and Prasanna 2006]. One class of algorithms to solve these systems, iterative methods, has drawn particular interest, with recent literature showing large performance improvements over General-Purpose Processors (GPPs) [Lopes and Constantinides 2008]. In several iterative methods, this performance gain is largely a result of parallelization of the matrix-vector multiplication, an operation that occurs in many applications and hence has also been widely studied on FPGAs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006]. However, whilst the performance of matrix-vector multiplication on FPGAs is generally I/O bound [Zhuo and Prasanna 2005], the nature of iterative methods allows the use of on-chip memory buffers to increase the bandwidth, providing the potential for significantly more parallelism [deLorimier and DeHon 2005]. Unfortunately, existing approaches have generally only either been capable of solving large matrices with limited improvement over GPPs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006; deLorimier and DeHon 2005], or achieve high performance for relatively small matrices [Lopes and Constantinides 2008; Boland and Constantinides 2008]. This article proposes hardware designs to take advantage of symmetrical and banded matrix structure, as well as methods to optimize the RAM use, in order to both increase the performance and retain this performance for larger-order matrices.

机译：计算线性方程组的解是科学计算中的一个基本问题，它的加速引起了FPGA界的广泛兴趣[Morris等。 2006年；张等。 2008年；卓和普拉萨纳[2006]。一类用于解决这些系统的算法，即迭代方法，引起了人们的特别关注，最近的文献表明，与通用处理器（GPPs）相比，性能有了很大的提高[Lopes and Constantinides 2008]。在几种迭代方法中，性能的提高很大程度上是矩阵矢量乘法并行化的结果，该运算发生在许多应用中，因此在FPGA上也得到了广泛的研究[Zhuo and Prasanna 2005; El-Kurdi等。 2006]。但是，尽管FPGA上矩阵矢量乘法的性能通常受I / O限制[Zhuo and Prasanna 2005]，但是迭代方法的性质允许使用片上存储缓冲器来增加带宽，从而为进一步增加带宽提供了潜力。并行性[deLorimier和DeHon 2005]。不幸的是，现有方法通常只能解决大型矩阵，而对GPP的改进有限[Zhuo and Prasanna 2005; El-Kurdi等。 2006年； deLorimier和DeHon 2005]，或者在相对较小的矩阵上实现高性能[Lopes and Constantinides 2008; Boland and Constantinides 2008]。本文提出了利用对称和带状矩阵结构的硬件设计，以及优化RAM使用的方法，以提高性能并为较大阶矩阵保留这种性能。

著录项

来源
《ACM transactions on reconfigurable technology and systems 》 |2011年第3期| p.15-28| 共14页
作者
DAVID BOLAND; GEORGE A. CONSTANTINIDES;
展开▼
作者单位

Imperial College London;

Imperial College London;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
additional key words and phrases: iterative methods; integer linear programming;

机译：其他关键字和词组：迭代方法;整数线性规划;

相似文献

外文文献
中文文献
专利

1. Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix-vector multiplication [J] . James D. Trotter, Johannes Langguth, Xing Cai Journal of Parallel and Distributed Computing . 2020 ,第Octa期

机译：多核CPU上不规则内存流量的缓存仿真：稀疏矩阵乘法性能模型的案例研究
2. Performance optimization of Sparse Matrix-Vector Multiplication for multi-component PDE-based applications using GPUs [J] . Abdelfattah Ahmad, Ltaief Hatem, Keyes David, Concurrency and computation: practice and experience . 2016 ,第12期

机译：使用GPU对基于PDE的多组件应用的稀疏矩阵矢量乘法的性能优化
3. Performance modeling and optimization of sparse matrix-vector multiplication on NVIDIA CUDA platform [J] . Shiming Xu, Wei Xue, Hai Xiang Lin Journal of supercomputing . 2013 ,第3期

机译：NVIDIA CUDA平台上稀疏矩阵矢量乘法的性能建模和优化
4. Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods [C] . David Boland, George A. Constantinides Reconfigurable computing: Architectures, tools and applications . 2010

机译：在迭代方法中优化用于矩阵矢量乘法的内存带宽使用
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Modern Multi- and Many-Core Processors [O] . Elafrou, Athena, Goumas, Georgios, Koziris, Nektarios 2017

机译：稀疏矩阵向量机的性能分析与优化现代多核和多核处理器的乘法

Optimizing Memory Bandwidth Use and Performance for Matrix-Vector Multiplication in Iterative Methods

摘要

著录项

相似文献

相关主题

期刊订阅