首页> 外文OA文献 >High-level synthesis optimization for blocked floating-point matrix multiplication
【2h】

High-level synthesis optimization for blocked floating-point matrix multiplication

机译:阻塞浮点矩阵乘法的高级综合优化

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices.
机译:在过去的十年中,对FPGA上的浮点矩阵乘法进行了广泛的研究,并开发了有效的架构以及详细的性能模型。通过设计,这些IP核占用固定的空间,并不一定会优化所有可用资源的使用。此外,低级体系结构不容易进行参数化综合。在本文中,高级综合用于微调配置参数,以便在最大的资源利用率下获得最高的性能。提出了一种探索策略,可以针对任何给定的FPGA优化关键资源(DSP,存储器)的使用。为了解决FPGA上有限的存储器大小,组织了面向块的矩阵乘法,以便在CPU上完成块求和,同时在逻辑结构上同时发生块乘法。通过以格雷码排序方案流式传输各块,可最大程度地减少CPU和FPGA之间的通信开销,从而最大程度地提高了连续块矩阵乘积计算的数据复用率。使用高级综合优化,可编程逻辑以理论峰值性能的93%运行,而组合的CPU-FPGA设计实现2K与2K矩阵的浮点乘法的可用硬件处理速度的76%。

著录项

  • 作者

    DHollander Erik;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号