MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

Kim Byeongho; Chung Jongwook; Lee Eojin; Jung Wonkyung; Lee Sunjung; Choi Jaewan; Park Jaehyun; Wi Minbok; Lee Sukhan; Ahn Jung Ho

首页> 外文期刊>IEEE Transactions on Computers >MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

【24h】

MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

机译：MVID：移动DRAM中的稀疏矩阵矢量乘法，用于加速复发性神经网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recurrent Neural Networks (RNNs) spend most of their execution time performing matrix-vector multiplication (MV-mul). Because the matrices in RNNs have poor reusability and the ever-increasing size of the matrices becomes too large to fit in the on-chip storage of mobile/IoT devices, the performance and energy efficiency of MV-mul is determined by those of main-memory DRAM. Therefore, computing MV-mul within DRAM draws much attention. However, previous studies lacked consideration for the matrix sparsity, the power constraints of DRAM devices, and concurrency in accessing DRAM from processors while performing MV-mul. We propose a main-memory architecture called MViD, which performs MV-mul by placing MAC units inside DRAM banks. For higher computational efficiency, we use a sparse matrix format and exploit quantization. Because of the limited power budget for DRAM devices, we implement the MAC units only on a portion of the DRAM banks. We architect MViD to slow down or pause MV-mul for concurrently processing memory requests from processors while satisfying the limited power budget. Our results show that MViD provides 7.2x higher throughput compared to the baseline system with four DRAM ranks (performing MV-mul in a chip-multiprocessor) while running inference of Deep Speech 2 with a memory-intensive workload.

机译：经常性的神经网络（RNNS）花费大部分执行时间执行矩阵矢量乘法（MV-MUL）。因为RNN中的矩阵具有可重复使用性可差，并且矩阵的不断增长的尺寸变得太大，以适应移动/物联网设备的片上存储，MV-MUL的性能和能量效率由主 - 记忆DRAM。因此，在DRAM中计算MV-MUL非常关注。然而，以前的研究缺乏对矩阵稀疏性的考虑，DRAM设备的功率约束以及在执行MV-MUL时从处理器访问DRAM的并发性。我们提出了一个名为MVID的主内存架构，通过将MAC单元放置在DRAM Banks内进行MV-MUL。为了更高的计算效率，我们使用稀疏矩阵格式并利用量化。由于DRAM设备的电力预算有限，我们仅在DRAM银行的一部分上实现MAC单元。我们架构师MVID慢慢地减慢或暂停MV-MUL，以便在满足有限的电力预算的同时同时处理来自处理器的存储器请求。我们的结果表明，与具有四个DRAM等级（在芯片 - 多处理器中执行MV-MUL的MV-MUL执行MV-MUL）的基线系统相比，MVID提供了7.2倍的吞吐量。随着内存密集型工作负载运行深音2的推动。

著录项

来源
《IEEE Transactions on Computers》 |2020年第7期|955-967|共13页
作者
Kim Byeongho; Chung Jongwook; Lee Eojin; Jung Wonkyung; Lee Sunjung; Choi Jaewan; Park Jaehyun; Wi Minbok; Lee Sukhan; Ahn Jung Ho;
展开▼
作者单位

Seoul Natl Univ Grad Sch Convergence Sci & Technol 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Dept Elect Engn & Comp Engn 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Dept Transdisciplinary Studies 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Grad Sch Convergence Sci & Technol 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Dept Transdisciplinary Studies 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Grad Sch Convergence Sci & Technol 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Dept Transdisciplinary Studies 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Grad Sch Convergence Sci & Technol 1 Gwanak Ro Seoul 08826 South Korea;

Seoul Natl Univ Samsung Elect Seoul 08826 South Korea|Samsung Elect Suwon 16677 South Korea;

Seoul Natl Univ Grad Sch Convergence Sci & Technol 1 Gwanak Ro Seoul 08826 South Korea|Seoul Natl Univ Semicond Res Ctr Seoul 08826 South Korea;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Random access memory; Performance evaluation; Program processors; Sparse matrices; System-on-chip; Bandwidth; Recurrent neural networks; DRAM; in-memory processing; near-data processing;

机译：随机存取存储器;性能评估;程序处理器;稀疏矩阵;片上系统;带宽;经常性神经网络;DRAM;内存处理;近数据处理;近数据处理;

相似文献

外文文献
中文文献
专利

1. GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication [J] . Yuan Tao, Yangdong Deng, Shuai Mu, Concurrency and computation: practice and experience . 2015,第14期

机译：GPU加速的稀疏矩阵-向量乘法和稀疏矩阵-转置向量乘法
2. Iterative sparse matrix-vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi-graphics processing unit systems [J] . Bertil Schmidt, Hans Aribowo, Hoang-Vu Dang Concurrency and Computation . 2013,第4期

机译：迭代稀疏矩阵矢量乘法，用于在多图形处理单元系统上通过GF（2）加速块Wiedemann算法
3. Sparse matrix-vector multiplication on network-on-chip [J] . Sun C.-C., G?tze J., Jheng H.-Y., Advances in Radio Science . 2010,第10期

机译：片上网络上的稀疏矩阵向量乘法
4. Iterative sparse matrix-vector multiplication on in-memory cluster computing accelerated by GPUs for big data [C] . Jiwu Peng, Zheng Xiao, Cen Chen, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery . 2016

机译：GPU加速大数据的内存集群计算中的迭代稀疏矩阵矢量乘法
5. Analysis of High Performance Sparse Matrix-Vector Multiplication for Small Finite Fields [D] . Lambert, Matthew A. 2020

机译：小型有限字段高性能稀疏矩阵矢量乘法分析
6. Deep PUF: A Highly Reliable DRAM PUF-Based Authentication for IoT Networks Using Deep Convolutional Neural Networks [O] . Fatemeh Najafi, Masoud Kaveh, Diego Martín, 2021

机译：Deep Puf：使用深度卷积神经网络的IoT网络的基于高度可靠的DRAM PUF认证
7. Accelerating Sparse Matrix-Vector Multiplication on GPUs using Bit-Representation-Optimized Schemes [O] . Wai Teng Tang, et al. 2013

机译：利用比特表示优化方案加速GpU上的稀疏矩阵向量乘法

MViD: Sparse Matrix-Vector Multiplication in Mobile DRAM for Accelerating Recurrent Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅