...
首页> 外文期刊>IEEE computer architecture letters >pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning
【24h】

pPIM: A Programmable Processor-in-Memory Architecture With Precision-Scaling for Deep Learning

机译:PPIM:一种可编程处理器内存模型,具有深度学习的精确度缩放

获取原文
获取原文并翻译 | 示例
           

摘要

Memory access latencies and low data transfer bandwidth limit the processing speed of many data intensive applications such as Convolutional Neural Networks (CNNs) in conventional Von Neumann architectures. Processing in Memory (PIM) is envisioned as a potential hardware solution for such applications as the data access bottlenecks can be avoided in PIM by performing computations within the memory die. However, PIM realizations with logic-based complex processing units within the memory present complicated fabrication challenges. In this letter, we propose to leverage the existing memory infrastructure to implement a programmable PIM (pPIM), a novel Look-Up-Table (LUT)-based PIM where all the processing units are implemented solely with LUTs, as opposed to prior LUT-based PIM implementations that combine LUT with logic circuitry for computations. This enables pPIM to perform ultra-low power & low-latency operations with minimal fabrication complications. Moreover, the complete LUT-based design offers simple 'memory write' based programmability in pPIM. Enabling precision scaling further improves the performance and the power consumption for CNN applications. The programmability feature potentially makes it easier for online training implementations. Our preliminary simulations demonstrate that our proposed pPIM can achieve 2000x, 657.5x and 1.46x improvement in inference throughput per unit power consumption compared to state-of-the-art conventional processor architecture, Graphics Processing Unit (GPUs) and a prior hybrid LUT-logic based PIM respectively. Furthermore, precision scaling improves the energy efficiency of the pPIM approximately by 1.35x over its full-precision operation.
机译:内存访问延迟和低数据传输带宽限制了传统von Neumann架构中的许多数据密集型应用的处理速度,例如卷积神经网络(CNNS)。在存储器(PIM)中的处理被设想为用于这种应用的潜在硬件解决方案,因为通过在存储器管芯中执行数据可以在PIM中避免数据访问瓶颈。然而,PIM在存储器内具有基于逻辑的复杂处理单元的PIM实现存在复杂的制造挑战。在这封信中,我们建议利用现有的内存基础架构实现可编程PIM(PPIM),这是一个新的查找表(LUT),基于PIM,其中所有处理单元都仅与LUT实现,而不是先前的LUT基于PIM实现,将LUT与逻辑电路组合以进行计算。这使PPIM能够以最小的制造并发症执行超低功率和低延迟操作。此外,基于完整的LUT的设计提供了PPIM中简单的“内存写入”可编程性。启用精密缩放进一步提高了CNN应用的性能和功耗。可编程性功能可能使在线培训实现方便。我们的初步模拟表明,与最先进的传统处理器架构,图形处理单元(GPU)和先前的混合LUT - 基于逻辑的PIM。此外,精密缩放在其全精度操作上提高了大约1.35倍的PPIM的能量效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号