Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer

机译：使用高性能可重构计算机加速稀疏矩阵迭代求解器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

High performance reconfigurable computers (HPRCs), which combine general-purpose processors (GPPs) and field programmable gate arrays (FPGAs), are now commercially available. These interesting architectures allow for the creation of reconfigurable processors. HPRCs have already been used to accelerate integer and fixed-point applications. However, extensive parallelism and deeply pipelined floating-point cores are necessary to make MHz-scale FPGAs competitive with GHz-scale GPPs, thus making it difficult to accelerate certain kinds of floating-point kernels. Kernels with variable length nested loops, e.g., sparse matrix-vector multiply, have been problematic because of the loop-carried dependence associated with the pipelined floating-point units. While hardware description language (HDL)-based kernels have shown moderate success in addressing this problem, the use of a high-level language (HLL)-based approach to accelerate such applications has been rather elusive. If HPRCs are to become a part of mainstream military and scientific computing, we should emphasize the use of HLL-based programming, whenever possible, rather than HDL-based hardware design. The primary reason is the increased programmer productivity associated with HLLs when compared with HDLs. For example, the floating-point addition statement z = x+y, a single line in an HLL, corresponds to hundreds of lines of HDL. In this paper, we describe the design and implementation of a sparse matrix Jacobi processor to solve systems of linear equations, Ax=b. The parallelized, deeply pipelined, IEEE-754-compliant 32-bit floating-point sparse matrix Jacobi iterative solver runs on a contemporary HPRC. The FPGA-based components are implemented using only an HLL (the C programming language) and the Carte HLL-to-HDL compiler. An HLL-based streaming accumulator allows for the implementation of fully pipelined loops and results in a 2.5-fold wall clock runtime speedup when compared with an equivalent software-only i--mplementation.

机译：结合了通用处理器（GPPs）和现场可编程门阵列（FPGA）的高性能可重新配置计算机（HPRC）现在可以在市场上买到。这些有趣的体系结构允许创建可重新配置的处理器。 HPRC已用于加速整数和定点应用程序。但是，要使MHz级FPGA与GHz级GPP竞争，必须具有广泛的并行性和深度流水线式浮点内核，因此很难加速某些种类的浮点内核。具有可变长度的嵌套循环（例如，稀疏矩阵矢量乘法）的内核由于与流水线浮点单元相关联的循环携带依赖性而成为问题。尽管基于硬件描述语言（HDL）的内核在解决此问题方面已显示出一定程度的成功，但是使用基于高级语言（HLL）的方法来加速此类应用程序却相当困难。如果HPRC要成为主流军事和科学计算的一部分，我们应该尽可能强调使用基于HLL的编程，而不是基于HDL的硬件设计。主要原因是与HDL相比，与HLL相关的程序员生产率提高了。例如，HLL中的单行浮点加法语句z = x + y对应于HDL的数百行。在本文中，我们描述了稀疏矩阵Jacobi处理器的设计和实现，用于求解线性方程组Ax = b。并行，深度管线化，符合IEEE-754的32位浮点稀疏矩阵Jacobi迭代求解器在现代HPRC上运行。基于FPGA的组件仅使用HLL（C编程语言）和Carte HLL到HDL编译器来实现。与等效的纯软件i-相比，基于HLL的流式累加器可实现完整的流水线循环，并导致2.5倍的挂钟运行时加速。 -- 实现。

著录项

来源
《2010 DoD High Performance Computing Modernization Program Users Group Conference》|2011年|p.517-523|共7页
会议地点
作者
Morris Gerald R.; McGruder Ricky Y.; Abed Khalid H.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机的应用;大气科学（气象学）;
关键词
FPGA; iterative solver; reconfigurable computer; sparse matrix;

机译：FPGA;迭代求解器;可重构计算机;稀疏矩阵;

相似文献

外文文献
中文文献
专利

1. GPU-ACCELERATED SPARSE MATRIX-MATRIX MULTIPLICATION BY ITERATIVE ROW MERGING [J] . Gremse Felix, Hoefter Andreas, Schwen Lars Ole, SIAM Journal on Scientific Computing . 2015,第1期

机译：通过迭代行合并实现GPU加速的稀疏矩阵-矩阵乘法
2. Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems [J] . Bertil Schmidt, Hans Aribowo, Hoang-Vu Dang Concurrency and Computation . 2013,第4期

机译：迭代稀疏矩阵矢量乘法，用于在多图形处理单元系统上通过GF（2）加速块Wiedemann算法
3. Computer implementations of iterative and non-iterative crystal plasticity solvers on high performance graphics hardware [J] . Savage Daniel J., Knezevic Marko Computational Mechanics: Solids, Fluids, Fracture Transport Phenomena and Variational Methods . 2015,第4期

机译：高性能图形硬件上迭代和非迭代晶体可塑性求解器的计算机实现
4. Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer [C] . Morris Gerald R., McGruder Ricky Y., Abed Khalid H. DoD High Performance Computing Modernization Program Users Group Conference . 2010

机译：使用高性能可重新配置计算机加速稀疏矩阵迭代求解器
5. Iterative Solver Selection Techniques for Sparse Linear Systems [D] . Sood, Kanika. 2019

机译：稀疏线性系统的迭代求解器选择技术
6. Accelerated Time-of-Flight Magnetic Resonance Angiography with Sparse Undersampling and Iterative Reconstruction for the Evaluation of Intracranial Arteries [O] . Hehan Tang, Na Hu, Yuan Yuan, 2019

机译：具有稀疏欠采样和迭代重建的颅骨动脉评估的飞行时间磁共振血管造影
7. Solving large sparse linear systems efficiently on Grid computers using an asynchronous iterative method as a preconditioner [O] . T. P. Collignon, M. B. Van Gijzen, Method As A Preconditioner 2008

机译：使用异步迭代方法作为预处理器，在网格计算机上有效地解决大型稀疏线性系统

Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer

摘要

著录项

相似文献

相关主题

期刊订阅