首页> 外文会议>International Conference on Information Science, Parallel and Distributed Systems >A Time-efficient and High-performance FPGA-based Continuous Floating-point Matrix Computing Accelerating Architecture for Control System
【24h】

A Time-efficient and High-performance FPGA-based Continuous Floating-point Matrix Computing Accelerating Architecture for Control System

机译:一种基于时间的高性能基于FPGA的连续浮点矩阵计算加速控制系统架构

获取原文

摘要

Matrix computing is one of the most important linear algebra modes that is broadly used in both scientific and engineering applications. Currently, there is still a lot of space for the optimization of continuous matrix computing accelerating. In this study, we first present two memory access optimization schemes which significantly minimize the I/O time and the total delay. Then, we extend the data accuracy of continuous matrix computing from double-precision to single-precision and half-precision floating-point data, which can enhance data diversity and improve computing performance. The experiments show that the I/O time is reduced by 40% after coarse-grained parallel optimization. Moreover, the I/O time is almost completely hidden by the calculation time after fine-grained data flow optimization. The accelerator achieves a maximum frequency of180 Mhz with 128 PEs and performs 184.3 GFLOPS for half-precision floating-point data. Our design is more outstanding in time-efficient and application scope comparing with state-of-the-art FPGA-based structures.
机译:矩阵计算是最重要的线性代数模式之一,广泛用于科学和工程应用中。当前,加速连续矩阵计算的优化仍有大量空间。在这项研究中,我们首先提出两种内存访问优化方案,它们可以最大程度地减少I / O时间和总延迟。然后,我们将连续矩阵计算的数据精度从双精度扩展到单精度和半精度浮点数据,从而可以增强数据多样性并提高计算性能。实验表明,经过粗粒度的并行优化后,I / O时间减少了40%。此外,在细粒度的数据流优化之后,I / O时间几乎完全被计算时间所掩盖。该加速器通过128个PE实现最高频率为180 Mhz,并为半精度浮点数据执行184.3 GFLOPS。与基于FPGA的最新结构相比,我们的设计在时间效率和应用范围方面更加出色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号