首页> 外文期刊>IEEE Transactions on Computers >BLADE: An in-Cache Computing Architecture for Edge Devices
【24h】

BLADE: An in-Cache Computing Architecture for Edge Devices

机译:刀片:边缘设备的缓存计算架构

获取原文
获取原文并翻译 | 示例

摘要

Area and power-constrained edge devices are increasingly utilized to perform compute intensive workloads, necessitating increasingly area and power-efficient accelerators. In this context, in-SRAM computing performs hundreds of parallel operations on spatially local data common in many emerging workloads, while reducing power consumption due to data movement. However, in-SRAM computing faces many challenges, including integration into the existing architecture, arithmetic operation support, data corruption at high operating frequencies, inability to run at low voltages, and low area density. To meet these challenges, this article introduces BLADE, a BitLine Accelerator for Devices on the Edge. BLADE is an in-SRAM computing architecture that utilizes local wordline groups to perform computations at a frequency 2.8x higher than state-of-the-art in-SRAM computing architectures. BLADE is integrated into the cache hierarchy of low-voltage edge devices, and simulated and benchmarked at the transistor, architecture, and software abstraction levels. Experimental results demonstrate performance/energy gains over an equivalent NEON accelerated processor for a variety of edge device workloads, namely, cryptography (4x performance gain/6x energy reduction), video encoding (6x/2x), and convolutional neural networks (3x/1.5x), while maintaining the highest frequency/energy ratio (up to 2.2 Ghz@1V) of any conventional in-SRAM computing architecture, and a low area overhead of less than 8 percent.
机译:区域和功率约束边缘设备越来越多地利用来执行计算密集型工作负载,需要越来越多的区域和高功率的加速器。在这种情况下,SRAM计算在许多新出现的工作负载中的空间本地数据上执行数百个并行操作,同时降低由于数据移动引起的功耗。然而,SRAM计算面临许多挑战,包括集成到现有的架构中,算术运算支持,高工作频率下的数据损坏,无法在低电压下运行,低区域密度。为满足这些挑战,本文介绍了刀片,用于边缘上的设备的位线加速器。刀片是一个SRAM计算架构,它利用本地字线组以比最先进的SRAM计算架构的2.8倍的频率执行计算。刀片集成到低压边缘设备的缓存层次结构中,并在晶体管,架构和软件抽象级别模拟和基准。实验结果证明了各种边缘设备工作负载的等效氖加速处理器的性能/能量收益,即加密(4倍性能增益/ 6倍能量减少),视频编码(6x / 2x)和卷积神经网络(3x / 1.5 x),同时保持任何传统的SRAM计算架构的最高频率/能量比(高达2.2 GHz @ 1V),低于8%的低区域开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号