首页> 外文会议>SOI-3D-Subthreshold Microelectronics Technology Unified Conference >Multi-ANN embedded system based on a custom 3D-DRAM
【24h】

Multi-ANN embedded system based on a custom 3D-DRAM

机译:基于定制3D-DRAM的多人工神经网络嵌入式系统

获取原文

摘要

Machine Learning in the form of Artificial Neural Networks (ANNs) has gained traction over the last few years especially in applications such as image recognition and speech recognition. These particular applications typically employ a subset of ANNs known as Convolutional Neural Networks (CNNs) which re-use parameters and thus reduce main memory bandwidth. However, there are other types of ANN that do not provide reuse opportunities such as autoencoders and Long Short-term memory (LSTM). It is generally accepted that dynamic random-access memory (DRAM) is required to store the ANN parameters of useful sized ANNs. To achieve a given performance, CNN-specific implementations utilize cache-like structures using static random-access memory (SRAM) which mimimizes accesses to the slower DRAM. Most research has focused on implementing CNNs but because of their extensive use of SRAM have both ANN size restrictions and performance degradation when used in applications that utilize other types of ANN. This work considers embedded applications employing multiple disparate generic ANNs which, assuming there are limited reuse opportunities in the form of re-use or batch processing, will require usable memory bandwidth on the order of tens of Tbit/s. This work provides support to Deep Neural Networks (DNNs) that do not provide ANN parameter reuse and suggests that these types of applications will require that all ANN parameters in main memory be accessed in real-time. This work coins the phrase “goldilocks bandwidth” when applied to ANN systems where the system provides the bandwidth required to read all ANN parameters at a real-time rate. This work employs pure 3DIC technology along with a proposed custom 3D-DRAM which exposes an entire page over a very wide databus (Fig 3). The 3DIC system die stack (Fig 1) includes the 3D-DRAM, a system manager layer and a Processing Engine (PE) layer collectively known as a Sub-System Column (SSC) (Fig 4). The targeted 3D-DRAM, the Tezzaron DiRAM4 [1]employs multiple memory array layers in conjunction with a control and IO layer and provides 64 separate vaults each providing 1 Gbit of storage which along with the suggested customizations provides this work up to 65 Tbit/s.
机译:在过去的几年中,以人工神经网络(ANN)形式出现的机器学习获得了广泛的关注,特别是在图像识别和语音识别等应用中。这些特定的应用程序通常使用称为卷积神经网络(CNN)的ANN的子集,该子集可以重新使用参数,从而减少主内存带宽。但是,还有其他类型的ANN无法提供重用的机会,例如自动编码器和长短期记忆(LSTM)。通常认为,需要动态随机存取存储器(DRAM)来存储有用大小的ANN的ANN参数。为了获得给定的性能,CNN特定的实现利用静态随机存取存储器(SRAM)来利用类似于缓存的结构,从而最大程度地减少了对较慢DRAM的访问。大多数研究都集中在实现CNN上,但是由于其在SRAM中的广泛使用,在使用其他类型的ANN的应用程序中时,具有ANN大小限制和性能下降。这项工作考虑了采用多个完全不同的通用ANN的嵌入式应用程序,这些应用程序假设以重用或批处理的形式存在有限的重用机会,则将需要数十Tbit / s的可用内存带宽。这项工作为不提供ANN参数重用的深度神经网络(DNN)提供了支持,并建议这些类型的应用程序将要求实时访问主内存中的所有ANN参数。当这项技术应用于ANN系统时,该术语就被冠以“ goldilocks带宽”一词,在该系统中,系统提供了以实时速率读取所有ANN参数所需的带宽。这项工作采用了纯3DIC技术以及建议的自定义3D-DRAM,该3D-DRAM在非常宽的数据总线上显示了整个页面(图3)。 3DIC系统管芯堆栈(图1)包括3D-DRAM,系统管理器层和处理引擎(PE)层,统称为子系统列(SSC)(图4)。 Tezzaron DiRAM4 [1]是针对性的3D-DRAM,采用了多个存储阵列层以及控制和IO层,并提供64个单独的保管库,每个保管库提供1 Gbit的存储,再加上建议的自定义功能,最高可以达到65 Tbit / s。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号