首页> 外文会议>2019 56th ACM/IEEE Design Automation Conference >On the Complexity Reduction of Dense Layers from O(N2) to O(NlogN) with Cyclic Sparsely Connected Layers
【24h】

On the Complexity Reduction of Dense Layers from O(N2) to O(NlogN) with Cyclic Sparsely Connected Layers

机译:循环稀疏连接层将致密层从O(N 2 )还原为O(NlogN)的复杂性

获取原文
获取原文并翻译 | 示例

摘要

In deep neural networks (DNNs), model size is an important factor affecting performance, energy efficiency and scalability. Recent works on weight pruning have shown significant reduction in model size at the expense of irregularity in the DNN architecture, which necessitates additional indexing memory to address non-zero weights, thereby increasing chip size, energy consumption and delays. In this paper, we propose cyclic sparsely connected (CSC) layers, with a memory/computation complexity of O(NlogN), that can be used as an overlay for fully connected (FC) layers whose number of parameters,O(N2), can dominate the parameters of the entire DNN model. The CSC layers are composed of a few sequential layers, referred to as support layers, which result in full connectivity between the Inputs and Outputs of each CSC layer. We introduce an algorithm to train models with FC layers replaced with CSC layers in a bottom-up approach by incrementally increasing the CSC layers characteristics such as connectivity and number of synapses, to achieve the desired accuracy given a compression rate. One advantage of the CSC layers is that there will be no requirement for indexing the non-zero weights. Our experimental results using AlexNet on ImageNet and LeNet300100 on MNIST indicate that by substituting FC layers with CSC layers, we can achieve $10imes to~46imes$ compression within a margin of 2% accuracy loss, which is comparable to non-structural pruning methods. A scalable parallel hardware architecture to implement CSC layers, and an equivalent scalable parallel architecture to efficiently implement non-structurally pruned FC layers are designed and fully placed and routed on Artix -7 FPGA and ASIC 65nm CMOS technology for LeNet300100 model. The results indicate that the proposed CSC hardware outperforms the conventional non-structurally pruned architecture with an equal compression rate by $sim2imes$ in power, energy, area and resource utilization when running at the same frequency.
机译:在深度神经网络(DNN)中,模型大小是影响性能,能源效率和可伸缩性的重要因素。最近有关权重修剪的工作表明,以DNN架构中的不规则性为代价,极大地减少了模型大小,这需要额外的索引存储器来处理非零权重,从而增加了芯片大小,能耗和延迟。在本文中,我们提出了循环稀疏连接(CSC)层,其存储/计算复杂度为O(NlogN),可用作参数数量为O(N \ n)的完全连接(FC)层的覆盖 2 \ n),可以控制整个DNN模型的参数。 CSC层由几个连续的层组成,称为支撑层,从而导致每个CSC层的输入和输出之间完全连接。我们通过自底向上的方法,通过递增地增加CSC层的特性(例如连接性和突触的数量)来引入一种算法来训练用FC层替换为CSC层的模型,从而在给定压缩率的情况下达到所需的精度。 CSC层的优点之一是,将不需要索引非零权重。我们使用ImageNet上的AlexNet和MNIST上的LeNet300100的实验结果表明,通过用CSC层替换FC层,我们可以在10%的压缩率下达到46倍的压缩率,而精度损失为2%,这与非结构修剪方法。在用于LeNet300100模型的Artix -7 FPGA和ASIC 65nm CMOS技术上设计并完全放置并布线,以实现CSC层的可伸缩并行硬件体系结构以及有效地实现非结构化修剪的FC层的等效可伸缩并行体系结构。结果表明,在相同频率下运行时,所建议的CSC硬件在功率,能量,面积和资源利用率方面均优于传统的非结构修剪架构,压缩率相同,仅为$ \ sim2 \\ times $。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号