Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs

机译：发条：用于FPGA上的多速率图像处理应用的资源高效静态调度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Image processing applications can benefit tremendously from FPGA acceleration. However, hardware accelerators for these applications look very different from the programs that image processing algorithm designers are accustomed to writing. As a result, many image processing hardware compilers have been designed to generate hardware accelerators from high-level specifications of image processing algorithms. Unfortunately, all of these compilers either exclude crucial access patterns, do not scale to realistic size applications, or rely on a compilation process in which each stage of the application is an independently scheduled module that sends data to its consumers through FIFOs which adds resource and energy overhead while inhibiting synthesis optimizations.In this paper we present a new algorithm for compiling image processing applications, Clockwork, that uses a combination of techniques from polyhedral analysis and synchronous dataflow (SDF) to overcome these limitations. Clockwork compiles the entire application into one flat, statically scheduled module. As a result, accelerators produced by Clockwork have fixed latency, cannot deadlock, and have no resource overhead from inter-stage FIFOs. We show that designs generated by Clockwork achieve on average a 55% reduction in LUTs, a 30% reduction in flip-flops, and a 22% reduction in BRAMs compared to a state-of-the-art stencil compiler at the same throughput, while handling a wider range of access patterns. Clockwork scales to applications with more than 100,000 LUTs. For an application with dozens of stages, Clockwork achieves energy efficiency 260x that of an 8 thread Intel CPU, 17x that of an NVIDIA K80 GPU, and 2.4x that of an NVIDIA V100 GPU.

机译：图像处理应用程序可以从FPGA加速度受益。但是，这些应用程序的硬件加速器看起来与图像处理算法设计人员习惯于写作的程序非常不同。因此，许多图像处理硬件编译器曾设计用于从图像处理算法的高级规格生成硬件加速器。不幸的是，所有这些编译器都排除了关键访问模式，不要扩展到现实大小应用程序，或者依赖于应用程序的每个阶段是一个独立计划的模块，该模块通过添加资源的FIFO将数据发送到其消费者的数据。抑制综合优化的同时能量开销。本文介绍了一种用于编译图像处理应用，发条的新算法，它使用来自多面体分析和同步数据流（SDF）的技术组合来克服这些限制。发条编译成一个平面静态调度模块的整个应用程序。因此，发条产生的加速器具有固定延迟，不能死锁，并且从级别FIFO之间没有资源开销。我们展示了发条产生的设计平均降低了LUT的55％，触发器减少了30％，与相同吞吐量的最先进的模板编译器相比，框的减少22％，在处理更广泛的访问模式时。发条秤尺度超过100,000个LUT的应用程序。对于具有数十个阶段的应用，发条效率实现了860倍的能效260X，其中8个线程Intel CPU，17倍的NVIDIA K80 GPU，2.4倍的NVIDIA V100 GPU。

著录项

来源
《IEEE Annual International Symposium on Field-Programmable Custom Computing Machines》|2021年|186-194|共9页
会议地点
作者
Dillon Huff; Steve Dai; Pat Hanrahan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Scheduling algorithms; Pipelines; Graphics processing units; Writing; System recovery; Throughput; Hardware;

机译：调度算法;管道;图形处理单元;写作;系统恢复;吞吐量;硬件;

相似文献

外文文献
中文文献
专利

1. A New FPGA and Programmable SoC Based VLSI Architecture for Histogram Generation of Grayscale Images for Image Processing Applications [J] . Sambaran Hazra, Sudip Ghosh, Santi P. Maity, Procedia Computer Science . 2016,第1期

机译：一种新的基于FPGA和可编程SoC的VLSI架构，可用于图像处理应用的灰度图像直方图生成
2. FPGA-Based Processor Acceleration for Image Processing Applications [J] . Fahad Siddiqui, Sam Amiri, Umar Ibrahim Minhas, Journal of Imaging . 2019,第1期

机译：用于图像处理应用的基于FPGA的处理器加速
3. FPGA-Based Soft-Core Processors for Image Processing Applications [J] . Amiri Moslem, Siddiqui Fahad Manzoor, Kelly Colm, Journal of signal processing systems for signal, image, and video technology . 2017,第1期

机译：基于FPGA的软核处理器，用于图像处理应用
4. Static scheduling of multi-rate and cyclo-static DSP-applications [C] . Bilsen, G., Engels, . 1994

机译：静态调度多速率和循环静态DSP应用
5. An FPGA based multi-spectrum data fusion and image processing method with application to embedded ladar imaging. [D] . Sliney, Philip Lawrence, II. 2005

机译：基于FPGA的多光谱数据融合与图像处理方法及其在嵌入式激光成像中的应用。
6. Efficient Smart CMOS Camera Based on FPGAs Oriented to Embedded Image Processing [O] . Ignacio Bravo, Javier Baliñas, Alfredo Gardel, 2011

机译：基于面向嵌入式图像处理的FPGA的高效智能CMOS相机
7. Model-Based Synthesis and Optimization of Static Multi-Rate Image Processing Algorithms [O] . Joachim Keinert, Hritam Dutta, Frank Hannig, 2009

机译：基于模型的静态多速率图像处理算法综合与优化

Clockwork: Resource-Efficient Static Scheduling for Multi-Rate Image Processing Applications on FPGAs

摘要

著录项

相似文献

相关主题

期刊订阅