...
首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators
【24h】

DNNVM: End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-Based CNN Accelerators

机译:DNNVM:端到端编译器利用基于FPGA的CNN加速器上的异构优化

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The convolutional neural network (CNN) has become a state-of-the-art method for several artificial intelligence domains in recent years. The increasingly complex CNN models are both computation-bound and I/O-bound. Field-programmable gate array-based accelerators driven by custom instruction set architecture (ISA) achieve a balance between generality and efficiency, but there is much on them left to be optimized. We propose the full-stack compiler deep neural network virtual machine (DNNVM), which is an integration of optimizers for graphs, loops and data layouts, an assembler, a runtime supporter, and a validation environment. The DNNVM works in the context of deep learning frameworks and transforms CNN models into the directed acyclic graph: XGraph. Based on XGraph, we transform the optimization challenges for both data layout and pipeline into graph-level problems. DNNVM enumerates all potentially profitable fusion opportunities by a heuristic subgraph isomorphism algorithm to leverage pipeline and data layout optimizations, and searches for the best choice of execution strategies of the whole computing graph. On the Xilinx ZU2@330 MHz and ZU9@330 MHz, we achieve equivalently state-of-the-art performance on our benchmarks by naive implementations without optimizations, and the throughput is further improved up to 1.26x by leveraging heterogeneous optimizations in DNNVM. Finally, with ZU9@330 MHz, we achieve state-of-the-art performance for VGG and ResNet50. We achieve a throughput of 2.82 TOPs/s and an energy efficiency of 123.7 GOPs/s/W for VGG. Additionally, we achieve 1.38 TOPs/s for ResNet50 and 1.41 TOPs/s for GoogleNet.
机译:卷积神经网络(CNN)已成为近年来几个人工智能域的最先进的方法。越来越复杂的CNN模型是计算绑定和I / O绑定。基于现场可编程的门阵列的加速器由自定义指令集架构驱动(ISA)在普遍性和效率之间实现平衡,但剩余有很多待优化。我们提出了全堆栈编译器深神经网络虚拟机(DNNVM),它是用于图形,环路和数据布局,汇编器,运行时支持者和验证环境的优化器的集成。 DNNVM在深度学习框架的背景下工作,并将CNN模型转换为定向的非循环图:XGraph。基于XGraph,我们将数据布局和管道的优化挑战转换为图形级问题。 DNNVM通过启发式子图同构算法枚举所有可能的有利可图的融合机会,以利用管道和数据布局优化,并搜索整个计算图的最佳执行策略选择。在Xilinx Zu2 @ 330 MHz和Zu9 @ 330 MHz上,我们通过在没有优化的情况下通过Naive实现的基准测试等效的最先进的性能,通过利用DNNVM中的异质优化,吞吐量进一步提高了1.26倍。最后,通过Zu9 @ 330 MHz,我们为VGG和RENET50实现了最先进的性能。我们实现了2.82个顶部/秒的吞吐量,为vgg提供了123.7个GOP / S / W的能效。此外,我们为Googlenet实现了Reset50和1.41顶部的1.38个顶部/秒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号