首页> 外文会议>2018 55th ACM/ESDA/IEEE Design Automation Conference >An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference
【24h】

An Efficient Kernel Transformation Architecture for Binary- and Ternary-Weight Neural Network Inference

机译:二元和三元神经网络推断的有效内核转换架构

获取原文
获取原文并翻译 | 示例

摘要

While deep convolutional neural networks (CNNs) have emerged as the driving force of a wide range of domains, their computationally and memory intensive natures hinder the further deployment in mobile and embedded applications. Recently, CNNs with low-precision parameters have attracted much research attention. Among them, multiplier-free binary- and ternary-weight CNNs are reported to be of comparable recognition accuracy with full-precision networks, and have been employed to improve the hardware efficiency. However, even with the weights constrained to binary and ternary values, large-scale CNNs still require billions of operations in a single forward propagation pass.In this paper, we introduce a novel approach to maximally eliminate redundancy in binary- and ternary-weight CNN inference, improving both the performance and energy efficiency. The initial kernels are transformed into much fewer and sparser ones, and the output feature maps are rebuilt from the immediate results. Overall, the number of total operations in convolution is reduced. To find an efficient transformation solution for each already trained network, we propose a searching algorithm, which iteratively matches and eliminates the overlap in a set of kernels. We design a specific hardware architecture to optimize the implementation of kernel transformation. Specialized dataflow and scheduling method are proposed. Tested on SVHN, AlexNet, and VGG-16, our architecture removes 43.4%-79.9% operations, and speeds up the inference by 1.48-3.01 times.
机译:尽管深度卷积神经网络(CNN)成为了广泛领域的驱动力,但它们的计算和内存密集型性质阻碍了它们在移动和嵌入式应用程序中的进一步部署。近来,具有低精度参数的CNN引起了很多研究关注。其中,据报道,无乘数的二重和三重CNN具有与高精度网络相当的识别精度,并已被用来提高硬件效率。然而,即使权重被限制为二进制和三进制值,大型CNN仍需要在单个前向传播通道中进行数十亿次运算。本文中,我们介绍了一种新颖的方法来最大程度地消除二进制和三进制CNN的冗余推理,提高性能和能源效率。最初的内核被转换为更少的和稀疏的内核,并且根据立即结果重建了输出特征图。总的来说,减少了卷积运算的总数。为了为每个已经训练的网络找到有效的转换解决方案,我们提出了一种搜索算法,该算法迭代地匹配并消除了一组内核中的重叠。我们设计特定的硬件体系结构以优化内核转换的实现。提出了专门的数据流和调度方法。经过SVHN,AlexNet和VGG-16的测试,我们的体系结构删除了43.4%-79.9%的运算,并将推理速度提高了1.48-3.01倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号