Unlocking Performance-Programmability by Penetrating the Intel FPGA OpenCL Toolflow

机译：通过渗透英特尔FPGA OpenCL工具流程来解锁性能可编程性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Improved support for OpenCL has been an important step towards the mainstream adoption of FPGAs as compute resources. Current research has shown, however, that programmability derived from use of OpenCL typically comes at a significant expense of performance, with the latter falling below that of hand-coded HDL, GPU, and even CPU designs. This can primarily be attributed to 1) constrained deployment opportunities, 2) high testing time-frames, and 3) limitations of the Board Support Package (BSP). We address these challenges by penetrating the toolflow and utilizing OpenCL-generated HDL (OpenCL-HDL), which is created as an initial step during the full compilation. OpenCL-HDL can be used as an intermediate stage in the design process to get better resource/latency estimates and perform RTL simulations. It can also be carved out and used as a building block for an existing HDL system. In this work, we present the process of generating, isolating, and re-interfacing OpenCL-HDL. We first propose a kernel template which reliably exploits parallelism opportunities and ensures all compute pipelines are implemented as a single HDL module. We then outline the process of identifying this module from the thousands of lines of compiler generated code. Finally, we categorize the different types of interfaces and present methods for connecting/bypassing them in order to support integration into an existing HDL shell. We evaluate our approach using a number of benchmarks from the Rodinia suite and Molecular Dynamics simulations. Our OpenCL-HDL implementations of all benchmarks show an average of 37x, 4.8x, and 3.5x speedup over existing FPGA/OpenCL, GPU, and FPGA/Verilog designs, respectively. We demonstrate that OpenCL-HDL is able to deliver hand-coded HDL-like performance with significantly less development effort and with competitive resource overhead.

机译：增强对OpenCL的支持，已成为将FPGA广泛用作计算资源的重要一步。但是，当前的研究表明，使用OpenCL派生的可编程性通常会大大牺牲性能，而后者却低于手工编码的HDL，GPU甚至CPU设计。这主要归因于1）受限的部署机会，2）高测试时间范围以及3）董事会支持包（BSP）的局限性。我们通过渗透工具流程并利用OpenCL生成的HDL（OpenCL-HDL）来应对这些挑战，它是在完整编译过程中的第一步。 OpenCL-HDL可用作设计过程的中间阶段，以获取更好的资源/延迟估计并执行RTL仿真。它也可以被雕刻出来并用作现有HDL系统的构件。在这项工作中，我们介绍了生成，隔离和重新连接OpenCL-HDL的过程。我们首先提出一个内核模板，该模板可靠地利用并行机会，并确保所有计算管道均作为单个HDL模块实现。然后，我们概述了从数千行编译器生成的代码中识别该模块的过程。最后，我们对不同类型的接口进行了分类，并介绍了用于连接/绕过它们的方法，以支持集成到现有的HDL Shell中。我们使用Rodinia套件中的许多基准和Molecular Dynamics模拟来评估我们的方法。我们所有基准测试的OpenCL-HDL实现分别比现有FPGA / OpenCL，GPU和FPGA / Verilog设计平均提高了37倍，4.8倍和3.5倍。我们证明，OpenCL-HDL能够以显着更少的开发工作量和具有竞争性的资源开销提供类似手工编码的HDL的性能。

著录项

来源
《IEEE High Performance Extreme Computing Conference》|2018年|1-8|共8页
会议地点
作者
Ahmed Sanaullah; Martin C Herbordt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Kernel; Hardware design languages; Pipelines; Field programmable gate arrays; Optimization; Standards; Graphics processing units;

机译：内核;硬件设计语言;管道;现场可编程门阵列;优化;标准;图形处理单元;

相似文献

外文文献
中文文献
专利

1. 3D Tomography Back-Projection Parallelization on Intel FPGAs Using OpenCL [J] . Martelli Maxime, Gac Nicolas, Merigot Alain, Journal of signal processing systems for signal, image, and video technology . 2019,第7期

机译：使用OpenCL在Intel FPGA上进行3D层析成像反投影并行化
2. An OpenCL-based parallel acceleration of aSobel edge detection algorithm Using IntelFPGA technology [J] . Abedalmuhdi Almomany, Ahmad Al-Omari, Amin Jarrah, South African Computer Journal . 2020,第1期

机译：使用IntelfPGA技术的Asobel边缘检测算法的基于OpenCL的并行加速度
3. Intel OpenCLを用いたディープニューラルネットワークのFPGA実現に関して [J] . 宇山拓夢, 藤井智也, 米川晴義, 電子情報通信学会技術研究報告. リコンフィギャラブルシステム. Reconfigurable Systems . 2017,第379期

机译：关于使用英特尔OpenCL的FPGA实现深神经网络
4. Unlocking Performance-Programmability by Penetrating the Intel FPGA OpenCL Toolflow [C] . Ahmed Sanaullah, Martin C Herbordt IEEE High Performance Extreme Computing Conference . 2018

机译：通过穿透英特尔FPGA OpenCL Toolflow来解锁性能可编程性
5. Acceleration of k-Nearest Neighbor and SRAD Algorithms Using Intel FPGA SDK for OpenCL [D] . Liu, Liyuan. 2018

机译：使用面向OpenCL的Intel FPGA SDK加速k最近邻和SRAD算法
6. EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform [O] . Hongzhi Huang, Yakun Wu, Mengqi Yu, 2020

机译：EDSSA：基于OpenCL的FPGA平台上的编码器 - 解码器语义分段网络加速器
7. 3D Tomography Back-Projection Parallelization on Intel FPGAs Using OpenCL [O] . Maxime Martelli, Nicolas Gac, Alain Mérigot, 2018

机译：使用OpenCL在英特尔FPGA上的3D断层扫描背部投影并行化

Unlocking Performance-Programmability by Penetrating the Intel FPGA OpenCL Toolflow

摘要

著录项

相似文献

相关主题

期刊订阅