A Time-efficient and High-performance FPGA-based Continuous Floating-point Matrix Computing Accelerating Architecture for Control System

机译：一种基于时间的高性能基于FPGA的连续浮点矩阵计算加速控制系统架构

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Matrix computing is one of the most important linear algebra modes that is broadly used in both scientific and engineering applications. Currently, there is still a lot of space for the optimization of continuous matrix computing accelerating. In this study, we first present two memory access optimization schemes which significantly minimize the I/O time and the total delay. Then, we extend the data accuracy of continuous matrix computing from double-precision to single-precision and half-precision floating-point data, which can enhance data diversity and improve computing performance. The experiments show that the I/O time is reduced by 40% after coarse-grained parallel optimization. Moreover, the I/O time is almost completely hidden by the calculation time after fine-grained data flow optimization. The accelerator achieves a maximum frequency of180 Mhz with 128 PEs and performs 184.3 GFLOPS for half-precision floating-point data. Our design is more outstanding in time-efficient and application scope comparing with state-of-the-art FPGA-based structures.

机译：矩阵计算是最重要的线性代数模式之一，广泛用于科学和工程应用中。当前，加速连续矩阵计算的优化仍有大量空间。在这项研究中，我们首先提出两种内存访问优化方案，它们可以最大程度地减少I / O时间和总延迟。然后，我们将连续矩阵计算的数据精度从双精度扩展到单精度和半精度浮点数据，从而可以增强数据多样性并提高计算性能。实验表明，经过粗粒度的并行优化后，I / O时间减少了40％。此外，在细粒度的数据流优化之后，I / O时间几乎完全被计算时间所掩盖。该加速器通过128个PE实现最高频率为180 Mhz，并为半精度浮点数据执行184.3 GFLOPS。与基于FPGA的最新结构相比，我们的设计在时间效率和应用范围方面更加出色。

著录项

来源
《International Conference on Information Science, Parallel and Distributed Systems》|2020年|256-260|共5页
会议地点
作者
Longlong Zhang; Yuanxi Peng; Ahui Huang; Xiao Hu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Random access memory; Optimization; Acceleration; Field programmable gate arrays; Arrays; Signal processing algorithms; Periodic structures;

机译：随机存取存储器;优化;加速;现场可编程门阵列;阵列;信号处理算法;周期结构;

相似文献

外文文献
中文文献
专利

1. Bandwidth Compression of Floating-Point Numerical Data Streams for FPGA-Based High-Performance Computing [J] . Ueno Tomohiro, Sano Kentaro, Yamamoto Satoru ACM transactions on reconfigurable technology and systems . 2017,第3期

机译：基于FPGA的高性能浮点数值数据流的带宽压缩
2. A FPGA-based accelerated architecture for the Continuous GRASP [J] . Nogueira Bruno, Barboza Erick Computing . 2021,第7期

机译：用于连续掌握的基于FPGA的加速架构
3. A 32-bit FPGA-based Single Precision Floating-point Hybrid CORDIC Processor Based on RISC Architecture [J] . Muhammad Nasir Ibrahim, Chen Kean Tack, Zuraimi Yahya, Australian Journal of Basic and Applied Sciences . 2014,第2014期

机译：基于RISC架构的基于32位FPGA的单精度浮点混合CORDIC处理器
4. A universal FPGA-based floating-point matrix processor for mobile systems [C] . Wenqiang Wang, Kaiyuan Guo, Mengyuan Gu, International Conference on Field-Programmable Technology . 2014

机译：适用于移动系统的通用基于FPGA的浮点矩阵处理器
5. Architecture Design and Mapping Flow Development for FPGA-Accelerated Heterogeneous Computing Systems [D] . Liang, Hao. 2017

机译：FPGA加速异构计算系统的建筑设计与映射流动开发
6. FPGA-Based High-Performance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems: The ARTICo3 Framework [O] . Alfonso Rodríguez, Juan Valverde, Jorge Portilla, 2018

机译：基于FPGA的高性能嵌入式系统用于网络物理系统中的自适应边缘计算：ARTICo3框架
7. A Scalable Architecture for Accelerating Multi-operation and Continuous Floating-point Matrix Computing on FPGAs [O] . Longlong Zhang, Yuanxi Peng, Ahui Huang, 2020

机译：一种可扩展架构，用于加速FPGA上的多功能和连续浮点矩阵计算

A Time-efficient and High-performance FPGA-based Continuous Floating-point Matrix Computing Accelerating Architecture for Control System

摘要

著录项

相似文献

相关主题

期刊订阅