首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters
【24h】

CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters

机译:CSCNN:使用CentroSymmetric滤波器的CNN加速器算法 - 硬件共同设计

获取原文

摘要

Convolutional neural networks (CNNs) are at the core of many state-of-the-art deep learning models in computer vision, speech, and text processing. Training and deploying such CNN-based architectures usually require a significant amount of computational resources. Sparsity has emerged as an effective compression approach for reducing the amount of data and computation for CNNs. However, sparsity often results in computational irregularity, which prevents accelerators from fully taking advantage of its benefits for performance and energy improvement. In this paper, we propose CSCNN, an algorithm/hardware co-design framework for CNN compression and acceleration that mitigates the effects of computational irregularity and provides better performance and energy efficiency. On the algorithmic side, CSCNN uses centrosymmetric matrices as convolutional filters. In doing so, it reduces the number of required weights by nearly 50% and enables structured computational reuse without compromising regularity and accuracy. Additionally, complementary pruning techniques are leveraged to further reduce computation by a factor of $2.8-7.2imes $ with a marginal accuracy loss. On the hardware side, we propose a CSCNN accelerator that effectively exploits the structured computational reuse enabled by centrosymmetric filters, and further eliminates zero computations for increased performance and energy efficiency. Compared against a dense accelerator, SCNN and SparTen, the proposed accelerator performs $3.7imes $, $1.6imes $ and $1.3imes $ better, and improves the EDP (Energy Delay Product) by $8.9imes $, $2.8imes $ and $2.0imes $, respectively.
机译:卷积神经网络(CNNS)处于计算机视觉,语音和文本处理中许多最先进的深层学习模型的核心。培训和部署此类基于CNN的架构通常需要大量的计算资源。稀疏性已成为减少CNN的数据量和计算量的有效压缩方法。然而,稀疏性往往导致计算不规则,这可以防止加速器充分利用其对性能和能量改进的好处。在本文中,我们提出了CNN压缩和加速度的CSCNN,算法/硬件共同设计框架,其减轻了计算不规则性的影响,提供了更好的性能和能效。在算法侧,CSCNN使用CentroSymmetric矩阵作为卷积滤波器。在这样做时,它会通过近50%减少所需重量的数量,并且可以在不影响规则性和准确性的情况下实现结构化的计算重用。此外,互补修剪技术可以利用,以进一步减少计算,以边际精度损失为2.8-7.2倍的计算。在硬件方面,我们提出了一种CSCNN加速器,其有效利用CentroSymmetric滤波器所启用的结构化计算重用,并且进一步消除了零计算,以提高性能和能效。与致密的​​加速器,SCNN和Sparten相比,建议的加速器执行3.7美元,$ 1.6 倍$ 1.3 times $更好,并改善EDP(能源延迟产品)8.9美元 times $,$ 2.8 times $和$ 2.0 倍$。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号