首页> 外文会议>IEEE International Symposium on Circuits and Systems >MulMapper: Towards an Automated FPGA-Based CNN Processor Generator Based on a Dynamic Design Space Exploration
【24h】

MulMapper: Towards an Automated FPGA-Based CNN Processor Generator Based on a Dynamic Design Space Exploration

机译:MulMapper:基于动态设计空间探索的基于FPGA的自动CNN处理器生成器

获取原文

摘要

Many enterprises are adopting deep learning algorithms in their everyday tasks faster than ever. Convolutional Neural Networks (CNNs) in particular are being used widely due to the impressive performance in various application areas. FPGAs, on the other hand, are becoming a promising hardware platform for various deep learning algorithms including CNN. However, optimized and efficient FPGA design requires an expert with hardware design skills. This is particularly a challenge for deep learning practitioners who would like to accelerate their algorithm without worrying about the underlying hardware knowledge required to accomplish that in FPGAs. In this work we are proposing an automated framework, MulMapper, that can generate a functional and synthesized CNN processor hardware IP (using Vivado HLS) for Zynq-based FPGAs, given Caffe-based CNN definition file. We created a dynamic and novel design space utilizing Target Device Resource, Target Core Mode and Target Data Width as design space dimensions. MulMapper explores the design space in these three dimensions and proposes the optimum design points. We tested MulMapper framework on common CNN architectures, LeNet, CNP and CIFAR-10. It has been verified that early-stage MulMapper can lead to synthesis of resource-optimized CNN processor hardware IP that can be used for many regular CNN variants. Comparison with the state-of-the-art shows that architectures generated using MulMapper obtained up to 25-29× DSP48 and 13-20× on-chip memory reduction, with up to 0.35 GOP/sec performance.
机译:许多企业在日常任务中采用深度学习算法的速度比以往任何时候都要快。特别是卷积神经网络(CNN)由于在各种应用领域中的出色表现而得到了广泛的应用。另一方面,FPGA正在成为包括CNN在内的各种深度学习算法的有希望的硬件平台。但是,优化高效的FPGA设计需要具有硬件设计技能的专家。对于希望加速算法而又无需担心在FPGA中完成该任务所需的基础硬件知识的深度学习从业者而言,这尤其是一个挑战。在这项工作中,我们提出了一个自动化框架MulMapper,该框架可以为基于Zynq的FPGA生成给定基于Caffe的CNN定义文件的功能性和综合CNN处理器硬件IP(使用Vivado HLS)。我们利用目标设备资源,目标核心模式和目标数据宽度作为设计空间尺寸,创建了一个动态新颖的设计空间。 MulMapper探索了这三个维度的设计空间,并提出了最佳的设计要点。我们在常见的CNN架构LeNet,CNP和CIFAR-10上测试了MulMapper框架。已经证实,早期的MulMapper可以导致资源优化的CNN处理器硬件IP的综合,该IP可用于许多常规的CNN变体。与最新技术的比较表明,使用MulMapper生成的体系结构可获得多达25-29倍的DSP48和13-20倍的片上内存减少,性能高达0.35 GOP /秒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号