首页> 外文会议>International Symposium on Microarchitecture >DRISA: A DRAM-based Reconfigurable In-Situ Accelerator
【24h】

DRISA: A DRAM-based Reconfigurable In-Situ Accelerator

机译:Drisa:基于DRAM的可重构的入场式加速器

获取原文

摘要

Data movement between the processing units and the memory in traditional von Neumann architecture is creating the “memory wall” problem. To bridge the gap, two approaches, the memory-rich processor (more on-chip memory) and the compute-capable memory (processing-in-memory) have been studied. However, the first one has strong computing capability but limited memory capacity/bandwidth, whereas the second one is the exact the opposite.To address the challenge, we propose DRISA, a DRAM-based Reconfigurable In-Situ Accelerator architecture, to provide both powerful computing capability and large memory capacity/bandwidth. DRISA is primarily composed of DRAM memory arrays, in which every memory bitline can perform bitwise Boolean logic operations (such as NOR). DRISA can be reconfigured to compute various functions with the combination of the functionally complete Boolean logic operations and the proposed hierarchical internal data movement designs. We further optimize DRISA to achieve high performance by simultaneously activating multiple rows and subarrays to provide massive parallelism, unblocking the internal data movement bottlenecks, and optimizing activation latency and energy. We explore four design options and present a comprehensive case study to demonstrate significant acceleration of convolutional neural networks. The experimental results show that DRISA can achieve 8.8× speedup and 1.2× better energy efficiency compared with ASICs, and 7.7× speedup and 15× better energy efficiency over GPUs with integer operations.
机译:所述处理单元和传统冯诺依曼体系结构的存储器之间的数据移动,创造了“存储墙”的问题。为了弥补该间隙,有两种方法,富存储器处理器(多个片上存储器)和计算能力的存储器(处理式存储器)进行了研究。然而,第一个具有强大的计算能力,但有限的内存容量/带宽,而第二个是确切的opposite.To应对这一挑战,我们提出DRISA,基于DRAM的可重构原位加速器架构,同时提供强大的计算能力和大存储容量/带宽。 DRISA主要由DRAM存储器阵列,其中每个存储器位线可以按位的布尔逻辑运算(如NOR)执行的。 DRISA可以被重新配置来计算与功能完整的布尔逻辑运算和所提出的分层内部的数据移动的设计的组合的各种功能。我们进一步优化DRISA实现通过同时激活多个行和子阵列,以提供大规模并行,解除阻塞内部的数据移动的瓶颈,以及优化激活等待时间和能量高性能。我们探索四个设计方案,并提交一份全面的案例来证明卷积神经网络的显著加速。实验结果表明,DRISA可以实现8.8×和加速1.2×更好的能量效率的ASIC相比,和7.7×加速并超过与整数运算的GPU 15×更好的能量效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号