首页> 外文会议>The 24th IEEE International Symposium on Field-Programmable Custom Computing Machines >DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect
【24h】

DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect

机译:DeCO:具有低开销互连的基于DSP模块的FPGA加速器覆盖

获取原文
获取原文并翻译 | 示例

摘要

Coarse-grained FPGA overlay architectures paired with general purpose processors offer a number of advantages for general purpose hardware acceleration because of software-like programmability, fast compilation, application portability, and improved design productivity. However, the area overheads of these overlays, and in particular architectures with island-style interconnect, negate many of these advantages, preventing their use in practical FPGA-based systems. Crucially, the interconnect flexibility provided by these overlay architectures is normally over-provisioned for accelerators based on feed-forward pipelined datapaths, which in many cases have the general shape of inverted cones. We propose DeCO, a cone shaped cluster of FUs utilizing a simple linear interconnect between them. This reduces the area overheads for implementing compute kernels extracted from compute-intensive applications represented as directed acyclic dataflow graphs, while still allowing high data throughput. We perform design space exploration by modeling programmability overhead as a function of overlay design parameters, and compare to the programmability overhead of island-style overlays. We observe 87% savings in LUT requirements using the proposed approach compared to DSP block based island-style overlays. Our experimental evaluation shows that the proposed overlay exhibits an achievable frequency of 395 MHz, close to the DSP theoretical limit on the Xilinx Zynq. We also present an automated tool flow that provides a rapid and vendor-independent mapping of the high level compute kernel code to the proposed overlay.
机译:粗粒度的FPGA覆盖体系结构与通用处理器配合使用,由于具有类似于软件的可编程性,快速编译,应用程序可移植性以及提高的设计生产率,因此为通用硬件加速提供了许多优势。但是,这些覆盖层的面积开销,特别是具有岛式互连的体系结构,抵消了许多这些优点,从而阻止了它们在基于FPGA的实际系统中的使用。至关重要的是,对于基于前馈流水线数据路径的加速器,通常会过度提供这些覆盖体系结构提供的互连灵活性,而在许多情况下,它们通常具有倒圆锥体的形状。我们提出了DeCO,一种FU的圆锥形簇,它们之间利用简单的线性互连。这减少了用于实现从以有向非循环数据流图表示的计算密集型应用程序中提取的计算内核的实现的区域开销,同时仍允许高数据吞吐量。我们通过将可编程性开销建模为覆盖设计参数的函数来进行设计空间探索,并将其与岛式覆盖物的可编程性开销进行比较。与基于DSP块的孤岛式覆盖相比,使用所提出的方法,我们发现LUT要求节省了87%。我们的实验评估表明,建议的覆盖层展现出395 MHz的可实现频率,接近Xilinx Zynq上DSP的理论极限。我们还提出了一种自动化的工具流程,该流程提供了高级计算内核代码到拟议覆盖图的快速且独立于供应商的映射。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号