首页> 外文会议>Design, Automation and Test in Europe Conference and Exhibition >PSB-RNN: A Processing-in-Memory Systolic Array Architecture using Block Circulant Matrices for Recurrent Neural Networks
【24h】

PSB-RNN: A Processing-in-Memory Systolic Array Architecture using Block Circulant Matrices for Recurrent Neural Networks

机译:PSB-RNN:针对循环神经网络的使用块循环矩阵的内存中脉动阵列处理架构

获取原文

摘要

Recurrent Neural Networks (RNNs) are widely used in Natural Language Processing (NLP) applications as they inherently capture contextual information across spatial and temporal dimensions. Compared to other classes of neural networks, RNNs have more weight parameters as they primarily consist of fully connected layers. Recently, several techniques such as weight pruning, zero-skipping, and block circulant compression have been introduced to reduce the storage and access requirements of RNN weight parameters. In this work, we present a ReRAM crossbar based processing-in-memory (PIM) architecture with systolic dataflow incorporating block circulant compression for RNNs. The block circulant compression decomposes the operations in a fully connected layer into a series of Fourier transforms and point-wise operations resulting in reduced space and computational complexity. We formulate the Fourier transform and point-wise operations into in-situ multiply-and-accumulate (MAC) operations mapped to ReRAM crossbars for high energy efficiency and throughput. We also incorporate systolic dataflow for communication within the crossbar arrays, in contrast to broadcast and multicast communications, to further improve energy efficiency. The proposed architecture achieves average improvements in compute efficiency of 44x and 17x over a custom FPGA architecture and conventional crossbar based architecture implementations, respectively.
机译:经常性神经网络(RNN)广泛用于自然语言处理(NLP)应用,因为它们本质地捕获空间和时间尺寸的上下文信息。与其他类神经网络相比,RNN具有更多的重量参数,因为它们主要由完全连接的层组成。最近,已经引入了若干技术,例如重量修剪,零跳跃和块循环压缩,以减少RNN重量参数的存储和接入要求。在这项工作中,我们介绍了一个基于RERAM横杆的处理内存(PIM)架构,其包含用于RNN的块循环压缩的Systolic DataFlow。块循环压缩将完全连接层中的操作分解为一系列傅里叶变换和点亮操作,导致空间和计算复杂度降低。我们将傅里叶变换和点亮操作配合到原位乘法和累积(MAC)操作中映射到Reram Crossbars的高能量效率和吞吐量。我们还将Systolic DataFlow融入交叉轨道内的通信,与广播和多播通信相比,进一步提高了能源效率。所提出的体系结构分别通过自定义FPGA架构和基于传统的横杆的架构实现来实现44倍和17倍的平均改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号