pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

Sutradhar Purab Ranjan; Connolly Mark; Bavikadi Sathwika; Pudukotai Dinakarrao Sai Manoj; Indovina Mark A.; Ganguly Amlan

首页> 外文期刊>IEEE computer architecture letters >pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

【24h】

pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

机译：PPIM：一种可编程处理器内存模型，具有深度学习的精确度缩放

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Memory access latencies and low data transfer bandwidth limit the processing speed of many data intensive applications such as Convolutional Neural Networks (CNNs) in conventional Von Neumann architectures. Processing in Memory (PIM) is envisioned as a potential hardware solution for such applications as the data access bottlenecks can be avoided in PIM by performing computations within the memory die. However, PIM realizations with logic-based complex processing units within the memory present complicated fabrication challenges. In this letter, we propose to leverage the existing memory infrastructure to implement a programmable PIM (pPIM), a novel Look-Up-Table (LUT)-based PIM where all the processing units are implemented solely with LUTs, as opposed to prior LUT-based PIM implementations that combine LUT with logic circuitry for computations. This enables pPIM to perform ultra-low power & low-latency operations with minimal fabrication complications. Moreover, the complete LUT-based design offers simple 'memory write' based programmability in pPIM. Enabling precision scaling further improves the performance and the power consumption for CNN applications. The programmability feature potentially makes it easier for online training implementations. Our preliminary simulations demonstrate that our proposed pPIM can achieve 2000x, 657.5x and 1.46x improvement in inference throughput per unit power consumption compared to state-of-the-art conventional processor architecture, Graphics Processing Unit (GPUs) and a prior hybrid LUT-logic based PIM respectively. Furthermore, precision scaling improves the energy efficiency of the pPIM approximately by 1.35x over its full-precision operation.

机译：内存访问延迟和低数据传输带宽限制了传统von Neumann架构中的许多数据密集型应用的处理速度，例如卷积神经网络（CNNS）。在存储器（PIM）中的处理被设想为用于这种应用的潜在硬件解决方案，因为通过在存储器管芯中执行数据可以在PIM中避免数据访问瓶颈。然而，PIM在存储器内具有基于逻辑的复杂处理单元的PIM实现存在复杂的制造挑战。在这封信中，我们建议利用现有的内存基础架构实现可编程PIM（PPIM），这是一个新的查找表（LUT），基于PIM，其中所有处理单元都仅与LUT实现，而不是先前的LUT基于PIM实现，将LUT与逻辑电路组合以进行计算。这使PPIM能够以最小的制造并发症执行超低功率和低延迟操作。此外，基于完整的LUT的设计提供了PPIM中简单的“内存写入”可编程性。启用精密缩放进一步提高了CNN应用的性能和功耗。可编程性功能可能使在线培训实现方便。我们的初步模拟表明，与最先进的传统处理器架构，图形处理单元（GPU）和先前的混合LUT - 基于逻辑的PIM。此外，精密缩放在其全精度操作上提高了大约1.35倍的PPIM的能量效率。

著录项

来源
《IEEE computer architecture letters》 |2020年第2期|118-121|共4页
作者
Sutradhar Purab Ranjan; Connolly Mark; Bavikadi Sathwika; Pudukotai Dinakarrao Sai Manoj; Indovina Mark A.; Ganguly Amlan;
展开▼
作者单位

Rochester Inst Technol Kate Gleason Coll Engn Rochester NY 14623 USA;

Rochester Inst Technol Kate Gleason Coll Engn Rochester NY 14623 USA;

George Mason Univ Dept Elect & Comp Engn Fairfax VA 22030 USA;

George Mason Univ Dept Elect & Comp Engn Fairfax VA 22030 USA;

Rochester Inst Technol Kate Gleason Coll Engn Rochester NY 14623 USA;

Rochester Inst Technol Kate Gleason Coll Engn Rochester NY 14623 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Table lookup; Random access memory; Power demand; Delays; Data transfer; Throughput; Processing in memory; look up table; convolutional neural network; deep neural network; DRAM;

机译：表查找;随机存取存储器;电力需求;延迟;数据传输;吞吐量;在内存中处理;查找表;卷积神经网络;深神经网络;DRAM;

相似文献

外文文献
中文文献
专利

1. Learning programs is better than learning dynamics: A programmable neural network hierarchical architecture in a multi-task scenario [J] . Donnarumma Francesco, Prevete Roberto, de Giorgio Andrea, Adaptive Behavior . 2016,第1期

机译：学习程序胜于学习动力学：多任务场景中的可编程神经网络分层体系结构
2. SOLAR: Services-Oriented Deep Learning Architectures-Deep Learning as a Service [J] . Wang Chao, Gong Lei, Li Xi, Services Computing, IEEE Transactions on . 2021,第1期

机译：太阳能：以服务为导向的深度学习架构 - 深入学习作为服务
3. From shallow feature learning to deep learning: Benefits from the width and depth of deep architectures [J] . Zhong Guoqiang, Ling Xiao, Wang Li-Na Wiley interdisciplinary reviews. Data mining and knowledge discovery . 2019,第1期

机译：从浅发学习深入学习：从深度和深度的深度和深度的利益
4. Exploring the Programmability for Deep Learning Processors: from Architecture to Tensorization [C] . Chixiao Chen, Huwan Peng, Xindi Liu, 2018 55th ACM/ESDA/IEEE Design Automation Conference . 2018

机译：探索深度学习处理器的可编程性：从架构到张量化
5. Graduate student attitudes toward professor pedagogical content knowledge, transformational teaching practices, student-professor engagement in learning, and student deep learning in worldwide business and education programs. [D] . Economos, Jennifer Lynn. 2013

机译：研究生对教授的教学内容知识，变革性的教学实践，学生与教授的学习投入以及学生在全球商业和教育计划中的深度学习的态度。
6. Pheno‐Deep Counter: a unified and versatile deep learning architecture for leaf counting [O] . Mario Valerio Giuffrida, Peter Doerner, Sotirios A. Tsaftaris -1

机译：Pheno‐Deep计数器：用于叶子计数的统一且通用的深度学习架构
7. Implementing Deep Learning Techniques in 5G IoT Networks for 3D Indoor Positioning: DELTA (DeEp Learning-Based Co-operaTive Architecture) [O] . Brahim El Boudani, Loizos Kanaris, Akis Kokkinis, 2020

机译：在3D室内定位5G IOT网络中实施深度学习技术：三角洲（基于深度学习的合作架构）

pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅