首页> 外文期刊>Emerging and Selected Topics in Circuits and Systems, IEEE Journal on >CASH-RAM: Enabling In-Memory Computations for Edge Inference Using Charge Accumulation and Sharing in Standard 8T-SRAM Arrays
【24h】

CASH-RAM: Enabling In-Memory Computations for Edge Inference Using Charge Accumulation and Sharing in Standard 8T-SRAM Arrays

机译:Cash-RAM:使用Charge累积和标准8T-SRAM阵列中的充电累积和共享实现EDIE推断的内存计算

获取原文
获取原文并翻译 | 示例

摘要

Machine Learning (ML) workloads being memory- and compute-intensive, consume large amounts of power running on conventional computing systems, restricting their implementations to large-scale data centers. Transferring large amounts of data from the edge devices to the data centers is not only energy expensive, but sometimes undesirable in security-critical applications. Thus, there is a need for building domain-specific hardware primitives for energy-efficient ML processing at the edge. One such approach - in-memory computing, eliminates frequent and unnecessary data-transfers between the memory and the compute units, by directly computing the data where it is stored. However, the analog nature of computations introduces non-idealities, which degrades the overall accuracy of neural networks. In this paper, we propose an in-memory computing primitive for accelerating dot-products within standard 8T-SRAM caches, using charge-sharing. The inherent parasitic capacitance of the bitlines and sourcelines is used for accumulating analog voltages, which can be sensed for an approximate dot product. The charge sharing approach involves a self-compensation technique which reduces the effects of non-idealities, thereby reducing the errors. Our results for ternary weight neural networks show that using the proposed compensation approaches, the accuracy degradation is within 1% and 5% of the baseline accuracy, for the MNIST and CIFAR-10 dataset, respectively, with an energy-delay product improvement of $38imes $ over the standard von-Neumann computing system. We believe that this work can be used in conjunction with existing mitigation techniques, such as re-training approaches, to further enhance system performance.
机译:机器学习(ML)工作负载是内存和计算密集型,消耗在传统计算系统上运行的大量电力,将其实现限制为大规模数据中心。将大量数据从边缘设备传输到数据中心不仅是昂贵的能量,而且在安全关键应用中有时不受欢迎。因此,需要在边缘处建立特定于域的硬件基元,以便在边缘处的节能ML处理。一种这样的方法内存计算,通过直接计算存储的数据,消除了存储器和计算单元之间的频繁和不必要的数据传输。然而,计算的模拟性质引入了非理想,这降低了神经网络的整体精度。在本文中,我们使用电荷共享提出了用于在标准8T-SRAM缓存中加速点产品的内存计算原语。比特素和源的固有寄生电容用于累积模拟电压,可以感测到近似点产品。电荷共享方法涉及自补偿技术,从而减少了非理想的影响,从而减少了错误。我们的三元重神经网络的结果表明,使用所提出的补偿方法,精度下降分别为基线精度的1%和5%,对于MNIST和CIFAR-10数据集,具有38美元的能源延迟产品提高 times $超过标准von-neumann计算系统。我们认为这项工作可以与现有的缓解技术一起使用,例如重新培训方法,以进一步提高系统性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号