CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices

机译：Circnn：使用块 - 循环重量矩阵加速和压缩深神经网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n^{2) to O(n log n) and the storage complexity from O(n^{2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same “effectiveness” as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CmCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and=performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.}}

机译：大型深度神经网络（DNN）都是计算和内存密集型。随着DNN的大小继续增长，在保持精度的同时提高能量效率和性能至关重要。对于DNN，模型大小是影响性能，可扩展性和能效的重要因素。重量修剪实现了良好的压缩比，但遭受了三个缺点：1）修剪后的不规则网络结构，影响性能和吞吐量; 2）增加训练复杂性; 3）缺乏压缩比和推理准确性的严谨保证。为了克服这些限制，本文提出了使用块循环矩阵来代表权重和过程神经网络的主要方法。 Circnn利用快速的傅里叶变换（FFT）快速乘法，同时降低来自O的计算复杂性（两者在推断和训练中）（n^{2 ）到O（n log n）和从o的存储复杂性（n^{2 ）到o（n），可忽略不计的准确性损失。与其他方法相比，Circnn由于其数学严格而截然不同：基于CircnN的DNN可以收敛到与没有压缩的DNN相同的“有效性”。我们提出了一个通用DNN推理引擎的Circnn架构，可以在具有可配置网络架构（例如，图层类型，大小，尺度等）的各种硬件/软件平台中实现。在Circnn架构：1）由于递归属性，FFT可以用作关键计算内核，这确保了通用和小型占地面积。 2）压缩但常规网络结构避免了网络修剪的陷阱，并利用高度流水线和平行设计促进高性能和吞吐量。为了展示性能和能源效率，我们在FPGA，ASIC和嵌入式处理器中测试CMCNN。我们的研究结果表明，Circnn架构实现了非常高的能量效率，=具有小硬件占地面积的性能。基于FPGA的实施和ASIC合成结果，Circnn与最先进的结果相比，实现了6 - 102倍的能效改善。}}

著录项

来源
《International Symposium on Microarchitecture》|2017年|xix 825 p. :|共14页
会议地点
作者
Caiwen Ding; Siyu Liao; Yanzhi Wang; Zhe Li; Ning Liu; Youwei Zhuo; Chao Wang; Xuehai Qian; Yu Bai; Geng Yuan; Xiaolong Ma; Yipeng Zhang; Jian Tang; Qinru Qiu; Xue Lin; Bo Yuan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP302-532;
关键词
computational complexity; data compression; fast Fourier transforms; field programmable gate arrays; learning (artificial intelligence); matrix algebra; neural net architecture;

机译：计算复杂性;数据压缩;快速傅里叶变换;现场可编程门阵列;学习（人工智能）;矩阵代数;神经网络建筑;

相似文献

外文文献
中文文献
专利

1. Corrections to “Deep Neural Networks With Random Gaussian Weights: A Universal Classification Strategy?” [J] . IEEE Transactions on Signal Processing . 2020,第期

机译：对“具有随机高斯权重的深层神经网络：通用分类策略？”的更正
2. GXNOR-Net: Training deep neural networks with ternary weights and activations without full-precision memory under a unified discretization framework [J] . Deng Lei, Jiao Peng, Pei Jing, Neural Networks: The Official Journal of the International Neural Network Society . 2018,第期

机译：GXNOR-NET：在统一的离散化框架下，使用三元权重和激活，在没有全精密内存的情况下培训深神经网络
3. Choroid segmentation from Optical Coherence Tomography with graph edge weights learned from deep convolutional neural networks [J] . Sui Xiaodan, Zheng Yuanjie, Wei Benzheng, Neurocomputing . 2017,第MAY10期

机译：从光学相干断层扫描中进行脉络膜分割，并从深卷积神经网络中学习图形边缘权重
4. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices [C] . Caiwen Ding, Siyu Liao, Yanzhi Wang, . 2017

机译：CirCNN：使用块循环权重矩阵加速和压缩深层神经网络
5. Pipelined Training with Stale Weights of Deep Convolutional Neural Networks [D] . ?Zhang, Lifu 2020

机译：流水线训练与深卷积神经网络的陈旧重量
6. Differential Evolution Based Layer-Wise Weight Pruning for Compressing Deep Neural Networks [O] . Tao Wu, Xiaoyang Li, Deyun Zhou, 2021

机译：基于差分进化的深层神经网络的层面重量修剪
7. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices [O] . Ding, Caiwen, Liao, Siyu, Wang, Yanzhi, 2017

机译：CirCNN：利用maTLaB加速和压缩深度神经网络 Block-CirculantWeight矩阵

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices

摘要

著录项

相似文献

相关主题

期刊订阅