This paper presents a workgroup synthesis mechanism to compile an OpenCL kernel to FPGA-based accelerators embedded in a multi-core CPU system-on-a-chip (SoC). The OpenCL kernels considered in this paper exhibit regular data access patterns. Coping with the limited amount of internal memory in embedded FPGAs, the workgroup synthesis utilises a novel data access pattern formulation to describe the parallelism already provided by the OpenCL kernels. To provide an OpenCL framework prototype to validate the proposed technique, a source-to-source compiler that transforms the OpenCL kernel into C/C++ code is developed. Then vendor-specific high-level synthesis tools are used to convert the C/C++ code into the FPGA bitstream. Results based on popular real applications show up to 89.8% improvement in the execution time compared to other commercial FPGA OpenCL implementations.
展开▼
机译:本文提出了一种工作组综合机制,用于将OpenCL内核编译为嵌入在多核CPU片上系统(SoC)中的基于FPGA的加速器。本文考虑的OpenCL内核具有常规的数据访问模式。为了应对嵌入式FPGA中有限的内部存储器,工作组综合利用一种新颖的数据访问模式公式来描述OpenCL内核已经提供的并行性。为了提供OpenCL框架原型来验证所提出的技术,开发了将OpenCL内核转换为C / C ++代码的源到源编译器。然后,使用特定于供应商的高级综合工具将C / C ++代码转换为FPGA位流。与其他商用FPGA OpenCL实施相比,基于流行的实际应用的结果显示执行时间缩短了89.8%。
展开▼