首页> 外文OA文献 >High-level synthesis optimization for blocked floating-point matrix multiplication

【2h】

High-level synthesis optimization for blocked floating-point matrix multiplication

机译：阻塞浮点矩阵乘法的高级综合优化

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices.

机译：在过去的十年中，对FPGA上的浮点矩阵乘法进行了广泛的研究，并开发了有效的架构以及详细的性能模型。通过设计，这些IP核占用固定的空间，并不一定会优化所有可用资源的使用。此外，低级体系结构不容易进行参数化综合。在本文中，高级综合用于微调配置参数，以便在最大的资源利用率下获得最高的性能。提出了一种探索策略，可以针对任何给定的FPGA优化关键资源（DSP，存储器）的使用。为了解决FPGA上有限的存储器大小，组织了面向块的矩阵乘法，以便在CPU上完成块求和，同时在逻辑结构上同时发生块乘法。通过以格雷码排序方案流式传输各块，可最大程度地减少CPU和FPGA之间的通信开销，从而最大程度地提高了连续块矩阵乘积计算的数据复用率。使用高级综合优化，可编程逻辑以理论峰值性能的93％运行，而组合的CPU-FPGA设计实现2K与2K矩阵的浮点乘法的可用硬件处理速度的76％。

著录项

作者
DHollander Erik;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. High-Level Synthesis Optimization for Blocked Floating-Point Matrix Multiplication [J] . Erik H. DHollander Computer architecture news . 2016,第4期

机译：块浮点矩阵乘法的高级综合优化
2. Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication Analyse du blocage et de l’ordonnancement d’une multiplication matricielle à virgule flottante sur un FPGA [J] . Khayyat A., Manjikian N. Electrical and Computer Engineering, Canadian Journal of . 2014,第2期

机译：基于FPGA的浮点矩阵乘法的调度与调度分析。
3. Analysis of Blocking and Scheduling for FPGA-Based Floating-Point Matrix Multiplication [J] . Ahmad Khayyat, Naraig Manjikian Canatian electrical engineering journal . 2014,第2期

机译：基于FPGA的浮点矩阵乘法的分组与调度分析。
4. Using High-Level Synthesis to Implement the Matrix-Vector Multiplication on FPGA [C] . Alessandro Marongiu, Paolo Palazzari International Conference ISC High Performance: International Conference on High Performance Computing . 2020

机译：使用高级综合在FPGA上实现矩阵向量乘法
5. Analysis-Driven Design of Parallel Floating-Point Matrix Multiplication for Implementation in Reconfigurable Logic. [D] . Khayyat, Ahmad. 2013

机译：分析驱动设计的可重配置逻辑中的并行浮点矩阵乘法。
6. Efficient Synthesis of Fmoc-Protected Phosphinic Pseudodipeptides: Building Blocks for the Synthesis of Matrix Metalloproteinase Inhibitors [O] . Manishabrata Bhowmick, Ravinder R. Sappidi, Gregg B. Fields, -1

机译：高效合成FMOC保护的膦酸伪肽：构建基质金属蛋白酶抑制剂的构成块
7. An Optimized Floating-Point Matrix Multiplication on FPGA [O] . Ting Zhang, Cheng Xu, Tao Li, 2013

机译：FPGA上的优化浮点矩阵乘法

High-level synthesis optimization for blocked floating-point matrix multiplication

摘要

著录项

相似文献

相关主题

期刊订阅