首页> 外文期刊>Computer architecture news >Tetris: Scalable and Efficient Neural Network Acceleration with 3D Memory
【24h】

Tetris: Scalable and Efficient Neural Network Acceleration with 3D Memory

机译:俄罗斯方块:具有3D内存的可扩展且高效的神经网络加速

获取原文
获取原文并翻译 | 示例
       

摘要

The high accuracy of deep neural networks (NNs) has led to the development of NN accelerators that improve performance by two orders of magnitude. However, scaling these accelerators for higher performance with increasingly larger NNs exacerbates the cost and energy overheads of their memory systems, including the on-chip SRAM buffers and the off-chip DRAM channels. This paper presents the hardware architecture and software scheduling and partitioning techniques for Tetris, a scalable NN accelerator using 3D memory. First, we show that the high throughput and low energy characteristics of 3D memory allow us to rebalance the NN accelerator design, using more area for processing elements and less area for SRAM buffers. Second, we move portions of the NN computations close to the DRAM banks to decrease bandwidth pressure and increase performance and energy efficiency. Third, we show that despite the use of small SRAM buffers, the presence of 3D memory simplifies dataflow scheduling for NN computations. We present an analytical scheduling scheme that matches the efficiency of schedules derived through exhaustive search. Finally, we develop a hybrid partitioning scheme that parallelizes the NN computations over multiple accelerators. Overall, we show that Tetris improves the performance by 4.1x and reduces the energy by 1.5x over NN accelerators with conventional, low-power DRAM memory systems.
机译:深度神经网络(NN)的高度精确性导致了NN加速器的开发,该加速器将性能提高了两个数量级。但是,随着越来越大的NN扩展这些加速器以实现更高的性能,会加剧其存储系统(包括片上SRAM缓冲区和片外DRAM通道)的成本和能源开销。本文介绍了Tetris的硬件体系结构,软件调度和分区技术,Tetris是使用3D内存的可伸缩NN加速器。首先,我们证明了3D内存的高吞吐量和低能耗特性使我们能够重新平衡NN加速器设计,使用更多的区域作为处理元素,而使用更少的区域作为SRAM缓冲区。其次,我们将部分NN计算移至DRAM组附近,以减少带宽压力并提高性能和能效。第三,我们表明尽管使用了较小的SRAM缓冲区,但是3D内存的存在简化了用于NN计算的数据流调度。我们提出了一种与通过详尽搜索得出的计划效率相匹配的分析计划方案。最后,我们开发了一种混合分区方案,该方案使多个加速器上的NN计算并行化。总的来说,与传统的低功耗DRAM内存系统相比,俄罗斯方块将性能提高了4.1倍,能耗降低了1.5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号