首页> 外文期刊>Journal of VLSI signal processing systems >Compression of Deep Neural Networks with Structured Sparse Ternary Coding
【24h】

Compression of Deep Neural Networks with Structured Sparse Ternary Coding

机译:结构稀疏三元编码的深度神经网络压缩

获取原文
获取原文并翻译 | 示例
           

摘要

Deep neural networks (DNNs) contain large number of weights, and usually require many off-chip memory accesses for inference. Weight size compression is a major requirement for on-chip memory based implementation of DNNs, which not only increases inference speed but also reduces power consumption. We propose a weight compression method for deep neural networks by combining pruning and quantization. The proposed method allows weights to have values of + 1 or - 1 only at predetermined positions. Then, a look-up table stores all possible combinations of sub-vectors of weight matrices. Encoding and decoding structured sparse weights can be conducted easily with the table. This method not only allows multiplication-free DNN implementations but also compresses the weight storage by as much as x32 times more than that in floating-point networks and with only a tiny performance loss. Weight distribution normalization and gradual pruning techniques are applied to lower performance degradation. Experiments are conducted with fully connected DNNs and convolutional neural networks.
机译:深度神经网络(DNN)包含大量权重,通常需要许多片外内存访问来进行推断。权重大小压缩是基于DNN的片上内存实现的主要要求,这不仅提高了推理速度,而且降低了功耗。我们提出了一种结合修剪和量化的深度神经网络权重压缩方法。所提出的方法允许权重仅在预定位置处具有+1或-1的值。然后,查找表存储权重矩阵的子向量的所有可能组合。使用该表可以轻松进行结构化的稀疏权重的编码和解码。这种方法不仅允许无乘法DNN的实现,而且将权重存储的压缩量是浮点网络中的权重存储的32倍以上,并且性能损失很小。权重分布归一化和逐步修剪技术可用于降低性能下降。使用完全连接的DNN和卷积神经网络进行实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号