首页> 外文会议> >Scalable and modular algorithms for floating-point matrix multiplication on FPGAs

【24h】

Scalable and modular algorithms for floating-point matrix multiplication on FPGAs

机译：FPGA上用于浮点矩阵乘法的可扩展和模块化算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Summary form only given. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of hardware implementations of scientific computations. We propose two FPGA-based algorithms for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications. We analyze the design tradeoffs in implementing this kernel on FPGAs. Our algorithms employ a linear array architecture with a small control logic. This architecture effectively utilizes the hardware resources on the entire FPGA and reduces the routing complexity. The processing elements (PEs) used in our algorithms are modular so that floating-point units can be easily embedded into them. In our designs, the floating-point units are optimized to maximize the number of PEs integrated on the FPGA as well as the clock speed. Experimental results show that our algorithms achieve high clock speeds and provide good scalability. Our algorithms achieve superior sustained floating-point performance compared with existing FPGA-based implementations and state-of-the-art processors.

机译：仅提供摘要表格。当前FPGA上丰富的硬件资源为提高科学计算的硬件实现的性能提供了新的机会。我们提出了两种基于FPGA的浮点矩阵乘法算法，这是许多科学应用中的基本内核。我们分析了在FPGA上实现该内核时的设计折衷。我们的算法采用带有小的控制逻辑的线性阵列架构。这种架构有效地利用了整个FPGA上的硬件资源，并降低了路由复杂度。我们算法中使用的处理元件（PE）是模块化的，因此可以轻松地将浮点单元嵌入其中。在我们的设计中，对浮点单元进行了优化，以最大化集成在FPGA上的PE的数量以及时钟速度。实验结果表明，我们的算法实现了较高的时钟速度并提供了良好的可扩展性。与现有的基于FPGA的实现和最先进的处理器相比，我们的算法可实现卓越的持续浮点性能。

著录项

来源
《》|2004年|p.92|共1页
会议地点
作者
Zhuo; L.; Prasanna; V.K.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
floating point arithmetic; field programmable gate arrays; matrix multiplication; optimisation; clocks; logic design; computational complexity; hardware resources; scientific computation; FPGA-based algorithm; floating-point matrix multiplication; array architecture; control logic; routing complexity; processing element maximization; clock speed; state-of-the-art processor; modular algorithm; kernel design;

机译：浮点算术;现场可编程门阵列;矩阵乘法;优化;时钟;逻辑设计;计算复杂度;硬件资源;科学计算;基于FPGA的算法;浮点矩阵乘法;阵列结构;控制逻辑;路由复杂度;处理元件最大化;时钟速度;最先进的处理器;模块化算法;内核设计;

相似文献

外文文献
中文文献
专利

1. Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems [J] . Ling Zhuo, Prasanna V.K. IEEE Transactions on Parallel and Distributed Systems . 2007,第4期

机译：可重构计算系统上浮点矩阵乘法的可扩展和模块化算法
2. Scalable and Modular Algorithms for Floating-Point Matrix Multiplication on Reconfigurable Computing Systems [J] . Ling Zhuo, Viktor K. Prasanna IEEE Transactions on Parallel and Distributed Systems . 2007,第期

机译：可重构计算系统上浮点矩阵乘法的可扩展和模块化算法
3. Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication Analyse du blocage et de l’ordonnancement d’une multiplication matricielle à virgule flottante sur un FPGA [J] . Khayyat A., Manjikian N. Electrical and Computer Engineering, Canadian Journal of . 2014,第2期

机译：基于FPGA的浮点矩阵乘法的调度与调度分析。
4. Scalable and modular algorithms for floating-point matrix multiplication on FPGAs [C] . Zhuo L., Prasanna V.K. International Parallel and Distributed Processing Symposium . 2004

机译：用于FPGA的浮点矩阵乘法的可扩展和模块化算法
5. A novel algorithm for fixed-point and floating-point matrix multiplication on a FPGA. [D] . Gandhi, Falguni. 2006

机译：一种用于FPGA上定点和浮点矩阵乘法的新颖算法。
6. Quantum hyperparallel algorithm for matrix multiplication [O] . Xin-Ding Zhang, Xiao-Ming Zhang, Zheng-Yuan Xue -1

机译：量子超并行矩阵乘法算法
7. A Scalable Architecture for Accelerating Multi-operation and Continuous Floating-point Matrix Computing on FPGAs [O] . Longlong Zhang, Yuanxi Peng, Ahui Huang, 2020

机译：一种可扩展架构，用于加速FPGA上的多功能和连续浮点矩阵计算

Scalable and modular algorithms for floating-point matrix multiplication on FPGAs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅