【24h】

Data Optimization CNN Accelerator Design on FPGA

机译:FPGA上的数据优化CNN加速器设计

获取原文

摘要

Image understanding is becoming a vital feature in ever more applications ranging from medical diagnostics to autonomous vehicles. Many applications demand for embedded solutions that integrate into existing systems with power constraints and tight real-time. Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity. Modern high-end FPGA generations feature hundreds of thousands of configurable logic blocks, and additionally include an abundance of hardened functional units which enable fast and efficient implementations of common functions. Many researchers have proposed their CNN accelerator prototypes on FPGA. But one problem of the stateof-the-art designs is that they have not solved the data dependence problem well. Data dependency is an important factor affecting accelerator performance. Current designs solve data dependence problem by adding hardware module on FPGA. But this approach has little effect and leads to increased hardware complexity. In this paper, we propose an optimization on the data arrangement in CNN. Which solves the data dependence in CNN by rearranging the data. The rearranged data is stored in a hardware-friendly form. By this way, our accelerator can apply pipeline technology better than current designs. We validate our approach on Xilinx Zynq XC-7Z045 board. The experimental results show that our approach has obvious advantages in terms of hardware resource consumption and bandwidth compare to state-of-the-art designs.
机译:图像理解正在成为从医疗诊断到自治车辆的更多应用程序的重要特征。许多应用程序对嵌入式解决方案的需求,该解决方案集成到具有电源限制和实时紧密的现有系统中。卷积神经网络(CNNS)目前在所有图像理解基准中实现了记录破坏的准确性,但具有非常高的计算复杂性。现代高端FPGA世代具有数百个可配置的逻辑块,还包括大量硬化功能单元,可实现常用功能的快速有效的实现。许多研究人员提出了在FPGA上的CNN加速器原型。但是,州内设计的一个问题是它们没有解决数据依赖性问题。数据依赖性是影响加速器性能的重要因素。目前设计通过在FPGA上添加硬件模块来解决数据依赖性问题。但这种方法几乎没有效果,导致硬件复杂性增加。在本文中,我们提出了关于CNN中数据排列的优化。通过重新排列数据来解决CNN中的数据依赖性。重新排列的数据以硬件友好的形式存储。通过这种方式,我们的加速器可以比当前设计更好地应用管道技术。我们在Xilinx Zynq XC-7Z045板上验证了我们的方法。实验结果表明,我们的方法在与最先进的设计方面的硬件资源消耗和带宽方面具有明显的优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号