首页> 外文会议>IEEE Latin American Symposium on Circuits and Systems >FPGA implementation of a feedforward neural network-based classifier using the xQuant technique
【24h】

FPGA implementation of a feedforward neural network-based classifier using the xQuant technique

机译:使用xQuant技术的基于前馈神经网络的分类器的FPGA实现

获取原文

摘要

For decades, feedforward neural network (FNN) based classifiers have been extensively used in many classification problems such as image and speech recognition. The inherent parallel nature of the FNN classifier makes it a good candidate for hardware implementations in order to obtain a performance speed up, since most of its computations are matrix-vector multiplications, where the input images arranged as vectors are multiplied by a matrix of parameters learned for a specific set of images. Nevertheless, when the number of parameter of a FNN classifier is large, embedding it in FPGA becomes a challenging task. This work presents the implementation of a quantisation strategy called xQuant in hardware. xQuant was developed to reduce the FNN implementation cost by approximating floating-point parameters by its nearest power of two and replacing floating-point multiplications by bit shift operations. Using the FNN classifier learning algorithm LAST (Learning Algorithm for Soft-Thresholding), a classifier for a texture classification problem was trained and used as study case. Both the floating-point and the xQuant architectures were implemented on a Xilinx Zynq-7020 SoC, and results show a three fold increase in the performance and a large reduction in FPGA resources: 4x reduction of LUTs; 7.4x reduction in FF; 10x reduction in RAM blocks and the elimination of DSP blocks.
机译:几十年来,基于前馈神经网络(FNN)的分类器已广泛用于许多分类问题,例如图像和语音识别。 FNN分类器固有的并行性质使其成为硬件实现以获得性能提升的理想选择,因为它的大多数计算都是矩阵向量乘法,其中将按向量排列的输入图像乘以参数矩阵学习了一组特定的图像。然而,当FNN分类器的参数数量很大时,将其嵌入FPGA成为一项艰巨的任务。这项工作介绍了在硬件中实现称为xQuant的量化策略的方法。 xQuant的开发目的是通过将浮点参数的最接近值乘以2并用位移位运算代替浮点乘法,从而降低FNN的实现成本。使用FNN分类器学习算法LAST(用于软阈值的学习算法),对纹理分类问题的分类器进行了训练并将其用作研究案例。浮点架构和xQuant架构都在Xilinx Zynq-7020 SoC上实现,结果表明性能提高了三倍,FPGA资源大大减少了:LUT减少了4倍; LUT减少了4倍。 FF减少7.4倍; RAM块减少10倍,DSP块减少。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号