A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices

ANTONIO ROLDAO; GEORGE A. CONSTANTINIDES

首页> 外文期刊>ACM transactions on reconfigurable technology and systems >A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices

【24h】

A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices

机译：密集矩阵基于FPGA的高吞吐量浮点共轭梯度实现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recent developments in the capacity of modern Field Programmable Gate Arrays (FPGAs) have significantly expanded their applications. One such field is the acceleration of scientific computation and one type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient (CG) algorithm. In this article we present a widely parallel and deeply pipelined hardware CG implementation, targeted at modern FPGA architectures. This implementation is particularly suited for accelerating multiple small-to-medium-sized dense systems of linear equations and can be used as a stand-alone solver or as building block to solve higher-order systems. In this article it is shown that through parallelization it is possible to convert the computation time per iteration for an order n matrix from Θ(n~2) clock cycles on a microprocessor to Θ(n) on a FPGA. Through deep pipelining it is also possible to solve several problems in parallel and maximize both performance and efficiency. I/O requirements are shown to be scalable and convergent to a constant value with the increase of matrix order. Post place-and-route results on a readily available VirtexII-6000 demonstrate sustained performance of 5 GFlops, and results on a Virtex5-330 indicate sustained performance of 35 GFlops. A comparison with an optimized software implementation running on a high-end CPU demonstrate that this FPGA implementation represents a significant speedup of at least an order of magnitude.

机译：现代现场可编程门阵列（FPGA）容量的最新发展极大地扩展了其应用范围。这样的领域之一是科学计算的加速，而在科学计算中很常见的一种计算类型是线性方程组的解。共轭梯度（CG）算法是一种已在软件中证明非常有效且鲁棒的方法，可以找到此类解决方案。在本文中，我们提出了针对现代FPGA架构的广泛并行且深入流水线化的硬件CG实现。此实现特别适合于加速多个线性方程的中小型稠密系统，并且可用作独立求解器或构建高阶系统的构建块。本文表明，通过并行化，可以将n阶矩阵每次迭代的计算时间从微处理器上的Θ（n〜2）个时钟周期转换为FPGA上的Θ（n）。通过深度流水线处理，还可以并行解决多个问题，并最大限度地提高性能和效率。随着矩阵顺序的增加，I / O需求显示出可伸缩性并收敛到一个恒定值。易于获得的VirtexII-6000上的放置和布线后结果显示了5个GFlop的持续性能，而Virtex5-330上的结果表明35个GFlop的持续性能。与在高端CPU上运行的优化软件实现的比较表明，这种FPGA实现代表了至少一个数量级的显着提高。

著录项

来源
《ACM transactions on reconfigurable technology and systems》 |2010年第1期|P.1.1-1.19|共19页
作者
ANTONIO ROLDAO; GEORGE A. CONSTANTINIDES;
展开▼
作者单位

Department of Electrical & Electronic Engineering, Imperial College London, South Kensington Campus, London, UK;

rnDepartment of Electrical & Electronic Engineering, Imperial College London, South Kensington Campus, London, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
algorithms; design; performance;

机译：算法;设计;性能;

相似文献

外文文献
中文文献
专利

1. SOMprocessor: A high throughput FPGA-based architecture for implementing Self-Organizing Maps and its application to video processing [J] . Neural Networks: The Official Journal of the International Neural Network Society . 2020,第期

机译：Somprocessor：基于高吞吐量FPGA的架构，用于实现自组织地图及其在视频处理中的应用程序
2. High-throughput parallel DWT hardware architecture implemented on an FPGA-based platform [J] . Ibraheem Mohammed Shaaban, Hachicha Khalil, Ahmed Syed Zahid, Journal of Real-Time Image Processing . 2019,第6期

机译：在基于FPGA的平台上实现的高吞吐量并行DWT硬件架构
3. Design and implementation of high throughput FPGA-based DVB-T system [J] . Ayat Mehdi, Hardani Hossein, Mirzakuchaki Sattar, Computers and Electrical Engineering . 2016,第Null期

机译：基于FPGA的高吞吐量DVB-T系统的设计与实现
4. A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation [C] . Antonio Roldao Lopes, George A. Constantinides Reconfigurable Computing: Architectures, Tools and Applications . 2008

机译：基于FPGA的高吞吐量浮点共轭梯度实现
5. Investigation of anisotropic charge transport in conjugated polymer based organic FETs by controlling the molecular orientation in large area ribbon-shaped floating films [D] . Tripathi Atul Shankar Mani 2019

机译：通过控制大面积带状浮膜中的分子取向研究基于共轭聚合物的有机FET中的各向异性电荷传输
6. Participation of a transmembrane proton gradient in 5-hydroxytryptamine transport by platelet dense granules and dense-granule ghosts. [O] . J A Wilkins, L Salganicoff 1981

机译：跨膜质子梯度通过血小板致密颗粒和致密颗粒鬼影参与5-羟色胺的转运。
7. A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices [O] . Roldao A, Constantinides GA 2010

机译：基于FpGa的高吞吐量密集矩阵浮点共轭梯度实现
8. Conjugate gradient type methods for linear systems with complex symmetric coefficient matrices [R] . Freund, Roland 1989

机译：具有复对称系数矩阵的线性系统的共轭梯度型方法

A High Throughput FPGA-Based Floating Point Conjugate Gradient Implementation for Dense Matrices

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅