首页> 外文会议>Annual IEEE/ACM International Symposium on Microarchitecture >SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator
【24h】

SCOPE: A Stochastic Computing Engine for DRAM-Based In-Situ Accelerator

机译:范围:基于DRAM的现场加速器的随机计算引擎

获取原文

摘要

Memory-centric architecture, which bridges the gap between compute and memory, is considered as a promising solution to tackle the memory wall and the power wall. Such architecture integrates the computing logic and the memory resources close to each other, in order to embrace large internal memory bandwidth and reduce the data movement overhead. The closer the compute and memory resources are located, the greater these benefits become. DRAM-based in-situ accelerators [1] tightly couple processing units to every memory bitline, achieving the maximum benefits among various memory-centric architectures. However, the processing units in such architectures are typically limited to simple functions like AND/OR due to strict area and power overhead constraints in DRAMs, making it difficult to accomplish complex tasks while providing high performance. In this paper, we address the challenge by applying stochastic computing arithmetic to the DRAM-based in-situ accelerator, targeting at the acceleration of error-tolerant applications such as deep learning. In stochastic computing, binary numbers are converted into stochastic bitstreams, which turns integer multiplications into simple bitwise AND operations, but at the expense of larger memory capacity/bandwidth demands. Stochastic computing is a perfect match for the DRAM-based in-situ accelerators because it addresses the in-situ accelerator's low performance problem by simplifying the operations, while leveraging the in-situ accelerator's advantage of large memory capacity/bandwidth. To further boost the performance and compensate for the numerical precision loss, we propose a novel Hierarchical and Hybrid Deterministic (H2D) stochastic computing arithmetic. Finally, we consider quantized deep neural network inference and training applications as a case study. The proposed architecture provides 2.3× improvement in performance per unit area compared with the binary arithmetic baseline, and 3.8× improvement over GPU. The proposed H2D arithmetic contributes 11× performance boost and 60% numerical precision improvement.
机译:以内存为中心的架构弥合了计算和内存之间的鸿沟,被认为是解决内存墙和电源墙的有前途的解决方案。这种体系结构将计算逻辑和内存资源彼此紧密集成在一起,以包含较大的内部内存带宽并减少数据移动开销。计算和内存资源的位置越近,这些好处就越大。基于DRAM的原位加速器[1]将处理单元紧密耦合到每个存储器位线,从而在各种以存储器为中心的体系结构中获得了最大的收益。然而,由于DRAM中严格的面积和功率开销约束,在这样的架构中的处理单元通常限于诸如AND / OR之类的简单功能,使得难以在提供高性能的同时完成复杂的任务。在本文中,我们通过将随机计算算法应用于基于DRAM的原位加速器来解决挑战,目标是加速诸如深度学习之类的容错应用程序。在随机计算中,二进制数被转换为随机比特流,从而将整数乘法转换为简单的按位与运算,但以更大的存储容量/带宽需求为代价。随机计算非常适合基于DRAM的原位加速器,因为它通过简化操作解决了原位加速器的低性能问题,同时充分利用了原位加速器的大存储容量/带宽优势。为了进一步提高性能并补偿数值精度损失,我们提出了一种新颖的分层和混合确定性(H 2 D)随机计算算法。最后,我们将量化深度神经网络推理和训练应用作为案例研究。与二进制算术基准相比,所提出的体系结构每单位面积的性能提高了2.3倍,而GPU则提高了3.8倍。拟议的H 2 D算术可将性能提高11倍,并将数值精度提高60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号