首页> 外文会议> >Scalable and modular algorithms for floating-point matrix multiplication on FPGAs
【24h】

Scalable and modular algorithms for floating-point matrix multiplication on FPGAs

机译:FPGA上用于浮点矩阵乘法的可扩展和模块化算法

获取原文
获取外文期刊封面目录资料

摘要

Summary form only given. The abundant hardware resources on current FPGAs provide new opportunities to improve the performance of hardware implementations of scientific computations. We propose two FPGA-based algorithms for floating-point matrix multiplication, a fundamental kernel in a number of scientific applications. We analyze the design tradeoffs in implementing this kernel on FPGAs. Our algorithms employ a linear array architecture with a small control logic. This architecture effectively utilizes the hardware resources on the entire FPGA and reduces the routing complexity. The processing elements (PEs) used in our algorithms are modular so that floating-point units can be easily embedded into them. In our designs, the floating-point units are optimized to maximize the number of PEs integrated on the FPGA as well as the clock speed. Experimental results show that our algorithms achieve high clock speeds and provide good scalability. Our algorithms achieve superior sustained floating-point performance compared with existing FPGA-based implementations and state-of-the-art processors.
机译:仅提供摘要表格。当前FPGA上丰富的硬件资源为提高科学计算的硬件实现的性能提供了新的机会。我们提出了两种基于FPGA的浮点矩阵乘法算法,这是许多科学应用中的基本内核。我们分析了在FPGA上实现该内核时的设计折衷。我们的算法采用带有小的控制逻辑的线性阵列架构。这种架构有效地利用了整个FPGA上的硬件资源,并降低了路由复杂度。我们算法中使用的处理元件(PE)是模块化的,因此可以轻松地将浮点单元嵌入其中。在我们的设计中,对浮点单元进行了优化,以最大化集成在FPGA上的PE的数量以及时钟速度。实验结果表明,我们的算法实现了较高的时钟速度并提供了良好的可扩展性。与现有的基于FPGA的实现和最先进的处理器相比,我们的算法可实现卓越的持续浮点性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号