首页> 外文会议>International Symposium on Microarchitecture >CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices
【24h】

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices

机译:Circnn:使用块 - 循环重量矩阵加速和压缩深神经网络

获取原文

摘要

Large-scale deep neural networks (DNNs) are both compute and memory intensive. As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy. For DNNs, the model size is an important factor affecting performance, scalability and energy efficiency. Weight pruning achieves good compression ratios but suffers from three drawbacks: 1) the irregular network structure after pruning, which affects performance and throughput; 2) the increased training complexity; and 3) the lack of rigirous guarantee of compression ratio and inference accuracy. To overcome these limitations, this paper proposes CIRCNN, a principled approach to represent weights and process neural networks using block-circulant matrices. CIRCNN utilizes the Fast Fourier Transform (FFT)-based fast multiplication, simultaneously reducing the computational complexity (both in inference and training) from O(n2) to O(n log n) and the storage complexity from O(n2) to O(n), with negligible accuracy loss. Compared to other approaches, CIRCNN is distinct due to its mathematical rigor: the DNNs based on CIRCNN can converge to the same “effectiveness” as DNNs without compression. We propose the CIRCNN architecture, a universal DNN inference engine that can be implemented in various hardware/software platforms with configurable network architecture (e.g., layer type, size, scales, etc.). In CIRCNN architecture: 1) Due to the recursive property, FFT can be used as the key computing kernel, which ensures universal and small-footprint implementations. 2) The compressed but regular network structure avoids the pitfalls of the network pruning and facilitates high performance and throughput with highly pipelined and parallel design. To demonstrate the performance and energy efficiency, we test CmCNN in FPGA, ASIC and embedded processors. Our results show that CIRCNN architecture achieves very high energy efficiency and=performance with a small hardware footprint. Based on the FPGA implementation and ASIC synthesis results, CIRCNN achieves 6 - 102X energy efficiency improvements compared with the best state-of-the-art results.
机译:大型深度神经网络(DNN)都是计算和内存密集型。随着DNN的大小继续增长,在保持精度的同时提高能量效率和性能至关重要。对于DNN,模型大小是影响性能,可扩展性和能效的重要因素。重量修剪实现了良好的压缩比,但遭受了三个缺点:1)修剪后的不规则网络结构,影响性能和吞吐量; 2)增加训练复杂性; 3)缺乏压缩比和推理准确性的严谨保证。为了克服这些限制,本文提出了使用块循环矩阵来代表权重和过程神经网络的主要方法。 Circnn利用快速的傅里叶变换(FFT)快速乘法,同时降低来自O的计算复杂性(两者在推断和训练中)(n 2 )到O(n log n)和从o的存储复杂性(n 2 )到o(n),可忽略不计的准确性损失。与其他方法相比,Circnn由于其数学严格而截然不同:基于CircnN的DNN可以收敛到与没有压缩的DNN相同的“有效性”。我们提出了一个通用DNN推理引擎的Circnn架构,可以在具有可配置网络架构(例如,图层类型,大小,尺度等)的各种硬件/软件平台中实现。在Circnn架构:1)由于递归属性,FFT可以用作关键计算内核,这确保了通用和小型占地面积。 2)压缩但常规网络结构避免了网络修剪的陷阱,并利用高度流水线和平行设计促进高性能和吞吐量。为了展示性能和能源效率,我们在FPGA,ASIC和嵌入式处理器中测试CMCNN。我们的研究结果表明,Circnn架构实现了非常高的能量效率,=具有小硬件占地面积的性能。基于FPGA的实施和ASIC合成结果,Circnn与最先进的结果相比,实现了6 - 102倍的能效改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号