NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration

Gupta Saransh; Imani Mohsen; Kaur Harveen; Rosing Tajana Simunic

首页> 外文期刊>IEEE Transactions on Computers >NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration

【24h】

NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration

机译：NNPIM：用于神经网络加速的处理中内存架构

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Neural networks (NNs) have shown great ability to process emerging applications such as speech recognition, language recognition, image classification, video segmentation, and gaming. It is therefore important to make NNs efficient. Although attempts have been made to improve NNs' computation cost, the data movement between memory and processing cores is the main bottleneck for NNs' energy consumption and execution time. This makes the implementation of NNs significantly slower on traditional CPU/GPU cores. In this paper, we propose a novel processing in-memory architecture, called NNPIM, that significantly accelerates neural network's inference phase inside the memory. First, we design a crossbar memory architecture that supports fast addition, multiplication, and search operations inside the memory. Second, we introduce simple optimization techniques which significantly improves NNs' performance and reduces the overall energy consumption. We also map all NN functionalities using parallel in-memory components. To further improve the efficiency, our design supports weight sharing to reduce the number of computations in memory and consecutively speedup NNPIM computation. We compare the efficiency of our proposed NNPIM with GPU and the state-of-the-art PIM architectures. Our evaluation shows that our design can achieve 131.5x higher energy efficiency and is 48.2x faster as compared to NVIDIA GTX 1,080 GPU architecture. Compared to state-of-the-art neural network accelerators, NNPIM can achieve on an average 3.6x higher energy efficiency and is 4.6x faster, while providing the same classification accuracy.

机译：神经网络（NN）具有处理语音识别，语言识别，图像分类，视频分割和游戏等新兴应用的强大能力。因此，重要的是使神经网络高效。尽管已经进行了一些尝试来提高NN的计算成本，但是内存和处理核心之间的数据移动是NN能耗和执行时间的主要瓶颈。这使得NN在传统CPU / GPU内核上的实现速度大大降低。在本文中，我们提出了一种称为NNPIM的新颖的内存处理架构，该架构显着加快了内存中神经网络的推理阶段。首先，我们设计一种纵横制存储器架构，该架构支持存储器内部的快速加法，乘法和搜索操作。其次，我们介绍了简单的优化技术，可显着提高神经网络的性能并降低总体能耗。我们还使用并行内存组件映射了所有NN功能。为了进一步提高效率，我们的设计支持权重共享，以减少内存中的计算数量并连续加速NNPIM计算。我们将建议的NNPIM与GPU的效率以及最新的PIM架构进行比较。我们的评估表明，与NVIDIA GTX 1,080 GPU架构相比，我们的设计可实现高131.5倍的能源效率，并且快48.2倍。与最先进的神经网络加速器相比，NNPIM可以平均提高3.6倍的能源效率，快4.6倍，同时提供相同的分类精度。

著录项

来源
《IEEE Transactions on Computers》 |2019年第9期|1325-1337|共13页
作者
Gupta Saransh; Imani Mohsen; Kaur Harveen; Rosing Tajana Simunic;
展开▼
作者单位

Univ Calif San Diego Dept Comp Sci & Engn La Jolla CA 92093 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Non-volatile memory; processing in-memory; neural networks;

机译：非易失性存储器;处理内存;神经网络;

相似文献

外文文献
中文文献
专利

1. Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration [J] . Chung SungWon, Wang Jiemi Emerging and Selected Topics in Circuits and Systems, IEEE Journal on . 2019,第3期

机译：紧密耦合的机器学习协处理器架构与模拟内存计算，可实现指令级加速
2. A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets [J] . Schuiki Fabian, Schaffner Michael, Gurkaynak Frank K., IEEE Transactions on Computers . 2019,第4期

机译：可扩展的近内存体系结构，用于在大型内存数据集中训练深度神经网络
3. A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets [J] . Schuiki Fabian, Schaffner Michael, Gurkaynak Frank K., IEEE Transactions on Computers . 2019,第4期

机译：一种可扩展的近记忆架构，用于培训大型内存数据集的深神经网络
4. Vesti: An In-Memory Computing Processor for Deep Neural Networks Acceleration [C] . Zhewei Jiang, Shihui Yin, Minkyu Kim, Asilomar Conference on Signals, Systems, and Computers . 2019

机译：Vesti：用于深度神经网络加速的内存计算处理器
5. Acceleration of spiking neural networks on multicore architectures. [D] . Jalasutram, Rommel. 2009

机译：多核架构上尖峰神经网络的加速。
6. Dissociated Emergent-Response System and Fine-Processing System in Human Neural Network and a Heuristic Neural Architecture for Autonomous Humanoid Robots [O] . Xiaodan Yan 2010

机译：神经网络中的分离式紧急响应系统和精细处理系统以及自主类人机器人的启发式神经体系结构
7. An Energy-Efficient and High Throughput in-Memory Computing Bit-Cell With Excellent Robustness Under Process Variations for Binary Neural Network [O] . Gobinda Saha, Zhewei Jiang, Sanjay Parihar, 2020

机译：节能高吞吐量计算比特单元，具有卓越的二元神经网络的变型鲁棒性

NNPIM: A Processing In-Memory Architecture for Neural Network Acceleration

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅