Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs

机译：拥抱多样性：增强型DSP模块，用于FPGA上的低精度深度学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Use of reduced precisions in Deep Learning (DL) inference tasks has recently been shown to significantly improve accelerator performance and greatly reduce both model memory footprint and the required external memory bandwidth. With appropriate network retuning, reduced precision networks can achieve accuracy close or equal to that of full-precision floating-point models. Given the wide spectrum of precisions used in DL inference, FPGAs' ability to create custom bit-width datapaths gives them an advantage over other acceleration platforms in this domain. However, the embedded DSP blocks in the latest Intel and Xilinx FPGAs do not natively support precisions below 18-bit and thus can not efficiently pack low-precision multiplications, leaving the DSP blocks under-utilized. In this work, we present an enhanced DSP block that can efficiently pack 2× as many 9-bit and 4× as many 4-bit multiplications compared to the baseline Arria-10-like DSP block at the cost of 12% block area overhead which leads to only 0.6% total FPGA core area increase. We quantify the performance gains of using this enhanced DSP block in two state-of-the-art convolutional neural network accelerators on three different models: AlexNet, VGG-16, and ResNet-50. On average, the new DSP block enhanced the computational performance of the 8-bit and 4-bit accelerators by 1.32× and 1.6× and at the same time reduced the utilized chip area by 15% and 30% respectively.

机译：最近显示，在深度学习（DL）推理任务中使用降低的精度可以显着提高加速器性能，并大大减少模型内存占用量和所需的外部内存带宽。通过适当的网络调整，降低精度的网络可以获得的精度接近或等于全精度浮点模型的精度。鉴于在DL推理中使用了各种各样的精度，FPGA创建自定义位宽数据路径的能力为其提供了优于该领域其他加速平台的优势。但是，最新的Intel和Xilinx FPGA中的嵌入式DSP块本身并不支持18位以下的精度，因此无法有效地打包低精度乘法，从而使DSP块未得到充分利用。在这项工作中，我们提出了一种增强型DSP块，与基线Arria-10-类DSP块相比，它可以有效打包2倍的9位乘法和4倍的4位乘法，而代价是占用12％的块面积这导致FPGA核心总面积仅增加0.6％。我们在三个不同模型上的两个最先进的卷积神经网络加速器中使用此增强的DSP块来量化性能提升，这些模型分别是AlexNet，VGG-16和ResNet-50。平均而言，新的DSP模块将8位和4位加速器的计算性能提高了1.32倍和1.6倍，同时将占用的芯片面积分别减少了15％和30％。

著录项

来源
《International Conference on Field Programmable Logic and Applications》|2018年|35-357|共323页
会议地点
作者
Andrew Boutros; Sadegh Yazdanshenas; Vaughn Betz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Field programmable gate arrays; Compressors; Memory management; Degradation; Computational modeling;

机译：现场可编程门阵列压缩机内存管理降级计算模型;

相似文献

外文文献
中文文献
专利

1. FPGAs and DSPs - competitive or complementary? - FPGAs can be an alternative to DSPs, or complement them [J] . Narinder Lall Embedded System Engineering . 2005,第8期

机译：FPGA和DSP-竞争还是互补？ -FPGA可以替代DSP或对其进行补充
2. Multipumping Flexible DSP Blocks for Resource Reduction on Xilinx FPGAs [J] . Bajaj Ronak, Suhaib A. Fahmy IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2017,第9期

机译：赛灵思FPGA上的多泵灵活DSP模块可减少资源
3. Efficient Exhaustive Verification of the Collatz Conjecture using DSP blocks of Xilinx FPGAs [J] . Yasuaki Ito, Koji Nakano International Journal of Networking and Computing . 2011,第1期

机译：使用Xilinx FPGA的DSP模块对Collatz猜想进行有效的详尽穷举验证
4. Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs [C] . Andrew Boutros, Sadegh Yazdanshenas, Vaughn Betz International Conference on Field Programmable Logic and Applications . 2018

机译：拥抱多样性：在FPGA的低精度深度学习中增强DSP块
5. FPGA Logic Block Architectures for Efficient Deep Learning Inference [D] . ?Eldafrawy, Mohamed Bahaaeldin Mohamed 2020

机译：FPGA逻辑块架构，用于高效的深度学习推论
6. Research on OpenCL optimization for FPGA deep learning application [O] . Shuo Zhang, Yanxia Wu, Chaoguang Men, 2012

机译：FPGA深度学习应用的OpenCL优化研究
7. A Flexible DSP Block to Enhance FPGA Arithmetic Performance [O] . Hadi Par, Ro Cevrero, Panagiotis Athanasopoulos, 2011

机译：增强DSP算法性能的灵活DSP模块

Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs

摘要

著录项

相似文献

相关主题

期刊订阅