DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect

机译：DeCO：具有低开销互连的基于DSP模块的FPGA加速器覆盖

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Coarse-grained FPGA overlay architectures paired with general purpose processors offer a number of advantages for general purpose hardware acceleration because of software-like programmability, fast compilation, application portability, and improved design productivity. However, the area overheads of these overlays, and in particular architectures with island-style interconnect, negate many of these advantages, preventing their use in practical FPGA-based systems. Crucially, the interconnect flexibility provided by these overlay architectures is normally over-provisioned for accelerators based on feed-forward pipelined datapaths, which in many cases have the general shape of inverted cones. We propose DeCO, a cone shaped cluster of FUs utilizing a simple linear interconnect between them. This reduces the area overheads for implementing compute kernels extracted from compute-intensive applications represented as directed acyclic dataflow graphs, while still allowing high data throughput. We perform design space exploration by modeling programmability overhead as a function of overlay design parameters, and compare to the programmability overhead of island-style overlays. We observe 87% savings in LUT requirements using the proposed approach compared to DSP block based island-style overlays. Our experimental evaluation shows that the proposed overlay exhibits an achievable frequency of 395 MHz, close to the DSP theoretical limit on the Xilinx Zynq. We also present an automated tool flow that provides a rapid and vendor-independent mapping of the high level compute kernel code to the proposed overlay.

机译：粗粒度的FPGA覆盖体系结构与通用处理器配合使用，由于具有类似于软件的可编程性，快速编译，应用程序可移植性以及提高的设计生产率，因此为通用硬件加速提供了许多优势。但是，这些覆盖层的面积开销，特别是具有岛式互连的体系结构，抵消了许多这些优点，从而阻止了它们在基于FPGA的实际系统中的使用。至关重要的是，对于基于前馈流水线数据路径的加速器，通常会过度提供这些覆盖体系结构提供的互连灵活性，而在许多情况下，它们通常具有倒圆锥体的形状。我们提出了DeCO，一种FU的圆锥形簇，它们之间利用简单的线性互连。这减少了用于实现从以有向非循环数据流图表示的计算密集型应用程序中提取的计算内核的实现的区域开销，同时仍允许高数据吞吐量。我们通过将可编程性开销建模为覆盖设计参数的函数来进行设计空间探索，并将其与岛式覆盖物的可编程性开销进行比较。与基于DSP块的孤岛式覆盖相比，使用所提出的方法，我们发现LUT要求节省了87％。我们的实验评估表明，建议的覆盖层展现出395 MHz的可实现频率，接近Xilinx Zynq上DSP的理论极限。我们还提出了一种自动化的工具流程，该流程提供了高级计算内核代码到拟议覆盖图的快速且独立于供应商的映射。

著录项

来源
《The 24th IEEE International Symposium on Field-Programmable Custom Computing Machines》|2015年|1-8|共8页
会议地点 Washington DC(US)
作者
Abhishek Kumar Jain; Xiangwei Li; Pranjul Singhai; Douglas L. Maskell; Suhaib A. Fahmy;
展开▼
作者单位

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;

Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;

Sch. of Eng., Univ. of Warwick, Coventry, UK;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Digital signal processing; Field programmable gate arrays; Kernel; Table lookup; Routing; Hardware;

机译：数字信号处理;现场可编程门阵列;内核;查表;路由;硬件;

相似文献

外文文献
中文文献
专利

1. Suitability of recent hardware accelerators (DSPs, FPGAs, and GPUs) for computer vision and image processing algorithms [J] . HajiRassouliha Amir, Taberner Andrew J., Nash Martyn P., Signal Processing. Image Communication: A Publication of the the European Association for Signal Processing . 2018,第期

机译：适用于电脑视觉和图像处理算法的最近硬件加速器（DSP，FPGA和GPU）的适用性
2. The iDEA DSP Block-Based Soft Processor for FPGAs [J] . HUI YAN CHEAH, FREDRIK BROSSER, SUHAIB A. FAHMY, ACM transactions on reconfigurable technology and systems . 2014,第3期

机译：适用于FPGA的基于iDEA DSP块的软处理器
3. FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks [J] . Kentaro Sano, Satoru Yamamoto IEEE Transactions on Parallel and Distributed Systems . 2017,第10期

机译：使用浮点DSP模块的基于FPGA的可扩展且高效节能的流体仿真
4. DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect [C] . Abhishek Kumar Jain, Xiangwei Li, Pranjul Singhai, IEEE International Symposium on Field-Programmable Custom Computing Machines . 2016

机译：DECO：基于DSP块的FPGA加速器叠加，具有低开销互连
5. A Hybrid Partially Reconfigurable Overlay Supporting Just-In-Time Assembly of Custom Accelerators on FPGAs. [D] . Aklah, Zeyad Tariq. 2017

机译：混合的部分可重新配置的叠加层，可在FPGA上即时组装定制加速器。
6. Families of FPGA-Based Accelerators for Approximate String Matching [O] . Tom Van Court, Martin C. Herbordt -1

机译：基于FPGA的加速器家族用于近似字符串匹配
7. DeCO : A DSP block based FPGA accelerator overlay with low overhead interconnect [O] . Jain, Abhishek Kumar, Li, Xiangwei, Singhai, Pranjul, 2016

机译：DeCO：具有低开销互连的基于DSP模块的FPGA加速器覆盖

DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect

摘要

著录项

相似文献

相关主题

期刊订阅