CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters

机译：CSCNN：使用CentroSymmetric滤波器的CNN加速器算法 - 硬件共同设计

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional neural networks (CNNs) are at the core of many state-of-the-art deep learning models in computer vision, speech, and text processing. Training and deploying such CNN-based architectures usually require a significant amount of computational resources. Sparsity has emerged as an effective compression approach for reducing the amount of data and computation for CNNs. However, sparsity often results in computational irregularity, which prevents accelerators from fully taking advantage of its benefits for performance and energy improvement. In this paper, we propose CSCNN, an algorithm/hardware co-design framework for CNN compression and acceleration that mitigates the effects of computational irregularity and provides better performance and energy efficiency. On the algorithmic side, CSCNN uses centrosymmetric matrices as convolutional filters. In doing so, it reduces the number of required weights by nearly 50% and enables structured computational reuse without compromising regularity and accuracy. Additionally, complementary pruning techniques are leveraged to further reduce computation by a factor of $2.8-7.2imes $ with a marginal accuracy loss. On the hardware side, we propose a CSCNN accelerator that effectively exploits the structured computational reuse enabled by centrosymmetric filters, and further eliminates zero computations for increased performance and energy efficiency. Compared against a dense accelerator, SCNN and SparTen, the proposed accelerator performs $3.7imes $, $1.6imes $ and $1.3imes $ better, and improves the EDP (Energy Delay Product) by $8.9imes $, $2.8imes $ and $2.0imes $, respectively.

机译：卷积神经网络（CNNS）处于计算机视觉，语音和文本处理中许多最先进的深层学习模型的核心。培训和部署此类基于CNN的架构通常需要大量的计算资源。稀疏性已成为减少CNN的数据量和计算量的有效压缩方法。然而，稀疏性往往导致计算不规则，这可以防止加速器充分利用其对性能和能量改进的好处。在本文中，我们提出了CNN压缩和加速度的CSCNN，算法/硬件共同设计框架，其减轻了计算不规则性的影响，提供了更好的性能和能效。在算法侧，CSCNN使用CentroSymmetric矩阵作为卷积滤波器。在这样做时，它会通过近50％减少所需重量的数量，并且可以在不影响规则性和准确性的情况下实现结构化的计算重用。此外，互补修剪技术可以利用，以进一步减少计算，以边际精度损失为2.8-7.2倍的计算。在硬件方面，我们提出了一种CSCNN加速器，其有效利用CentroSymmetric滤波器所启用的结构化计算重用，并且进一步消除了零计算，以提高性能和能效。与致密的加速器，SCNN和Sparten相比，建议的加速器执行3.7美元，$ 1.6 倍$ 1.3 times $更好，并改善EDP（能源延迟产品）8.9美元 times $，$ 2.8 times $和$ 2.0 倍$。

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2021年|612-625|共14页
会议地点
作者
Jiajun Li; Ahmed Louri; Avinash Karanth; Razvan Bunescu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Redundancy; Computer architecture; Filtering algorithms; Energy efficiency; Computational efficiency; Convolutional neural networks;

机译：培训;冗余;计算机架构;过滤算法;能效;计算效率;卷积神经网络;

相似文献

外文文献
中文文献
专利

1. Algorithm-Hardware Co-Design of Real-Time Edge Detection for Deep-Space Autonomous Optical Navigation [J] . Hao XIAO, Yanming FAN, Fen GE, IEICE transactions on information and systems . 2020,第10期

机译：深空自主光学导航的实时边缘检测算法 - 硬件共同设计
2. MemFlow: Memory-Driven Data Scheduling With Datapath Co-Design in Accelerators for Large-Scale Inference Applications [J] . Qi Nie, Sharad Malik IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第9期

机译：MEMFLOW：用于大型推理应用的加速器中的数据路径共同设计内存驱动数据调度
3. Co-design of deep neural nets and neural net accelerators for embedded vision applications [J] . Amid A., Kwon K., Gholami A., IBM Journal of Research and Development . 2019,第6期

机译：用于嵌入式视觉应用的深度神经网络和神经网络加速器的共同设计
4. Brain-inspired Co-design of Algorithm/Architecture for CNN Accelerators [C] . Stanislav Sedukhin, Kazuya Matsumoto, Yoichi Tomioka International Congress on Advanced Applied Informatics . 2019

机译：CNN加速器的大脑启发式算法/架构协同设计
5. Design of Hardware CNN Accelerators for Audio and Image Classification [D] . Gillela, Rohini Jayachandre. 2020

机译：音频和图像分类硬件CNN加速器的设计
6. Impact of detector selections on inter‐institutional variability of flattening filter‐free beam data for TrueBeam™ linear accelerators [O] . Yoshihiro Tanaka, Yuichi Akino, Hirokazu Mizuno, 2020

机译：检测器选择对TrueBeam™线性加速器的平坦化无滤波器波束数据的机构间可变性的影响
7. Performance evaluation over HW/SW co-design SoC memory transfers for a CNN accelerator [O] . A. Rios-Navarro, R. Tapiador-Morales, A. Jimenez-Fernandez, 2018

机译：用于CNN加速器的HW / SW Co-Design SoC存储器转移的性能评估

CSCNN: Algorithm-hardware Co-design for CNN Accelerators using Centrosymmetric Filters

摘要

著录项

相似文献

相关主题

期刊订阅