首页> 外文会议>IEEE High Performance Extreme Computing Conference >Unlocking Performance-Programmability by Penetrating the Intel FPGA OpenCL Toolflow
【24h】

Unlocking Performance-Programmability by Penetrating the Intel FPGA OpenCL Toolflow

机译:通过渗透英特尔FPGA OpenCL工具流程来解锁性能可编程性

获取原文

摘要

Improved support for OpenCL has been an important step towards the mainstream adoption of FPGAs as compute resources. Current research has shown, however, that programmability derived from use of OpenCL typically comes at a significant expense of performance, with the latter falling below that of hand-coded HDL, GPU, and even CPU designs. This can primarily be attributed to 1) constrained deployment opportunities, 2) high testing time-frames, and 3) limitations of the Board Support Package (BSP). We address these challenges by penetrating the toolflow and utilizing OpenCL-generated HDL (OpenCL-HDL), which is created as an initial step during the full compilation. OpenCL-HDL can be used as an intermediate stage in the design process to get better resource/latency estimates and perform RTL simulations. It can also be carved out and used as a building block for an existing HDL system. In this work, we present the process of generating, isolating, and re-interfacing OpenCL-HDL. We first propose a kernel template which reliably exploits parallelism opportunities and ensures all compute pipelines are implemented as a single HDL module. We then outline the process of identifying this module from the thousands of lines of compiler generated code. Finally, we categorize the different types of interfaces and present methods for connecting/bypassing them in order to support integration into an existing HDL shell. We evaluate our approach using a number of benchmarks from the Rodinia suite and Molecular Dynamics simulations. Our OpenCL-HDL implementations of all benchmarks show an average of 37x, 4.8x, and 3.5x speedup over existing FPGA/OpenCL, GPU, and FPGA/Verilog designs, respectively. We demonstrate that OpenCL-HDL is able to deliver hand-coded HDL-like performance with significantly less development effort and with competitive resource overhead.
机译:增强对OpenCL的支持,已成为将FPGA广泛用作计算资源的重要一步。但是,当前的研究表明,使用OpenCL派生的可编程性通常会大大牺牲性能,而后者却低于手工编码的HDL,GPU甚至CPU设计。这主要归因于1)受限的部署机会,2)高测试时间范围以及3)董事会支持包(BSP)的局限性。我们通过渗透工具流程并利用OpenCL生成的HDL(OpenCL-HDL)来应对这些挑战,它是在完整编译过程中的第一步。 OpenCL-HDL可用作设计过程的中间阶段,以获取更好的资源/延迟估计并执行RTL仿真。它也可以被雕刻出来并用作现有HDL系统的构件。在这项工作中,我们介绍了生成,隔离和重新连接OpenCL-HDL的过程。我们首先提出一个内核模板,该模板可靠地利用并行机会,并确保所有计算管道均作为单个HDL模块实现。然后,我们概述了从数千行编译器生成的代码中识别该模块的过程。最后,我们对不同类型的接口进行了分类,并介绍了用于连接/绕过它们的方法,以支持集成到现有的HDL Shell中。我们使用Rodinia套件中的许多基准和Molecular Dynamics模拟来评估我们的方法。我们所有基准测试的OpenCL-HDL实现分别比现有FPGA / OpenCL,GPU和FPGA / Verilog设计平均提高了37倍,4.8倍和3.5倍。我们证明,OpenCL-HDL能够以显着更少的开发工作量和具有竞争性的资源开销提供类似手工编码的HDL的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号