首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >NeuroMeter: An Integrated Power, Area, and Timing Modeling Framework for Machine Learning Accelerators Industry Track Paper
【24h】

NeuroMeter: An Integrated Power, Area, and Timing Modeling Framework for Machine Learning Accelerators Industry Track Paper

机译:神经仪:机器学习加速器行业轨道纸的集成电源,区域和时序建模框架

获取原文

摘要

As Machine Learning (ML) becomes pervasive in the era of artificial intelligence, ML specific tools and frameworks are required for architectural research. This paper introduces NeuroMeter, an integrated power, area, and timing modeling framework for ML accelerators. NeuroMeter models the detailed architecture of ML accelerators and generates a fast and accurate estimation on power, area, and chip timing. Meanwhile, it also enables the runtime analysis of system-level performance and efficiency when the runtime activity factors are provided. NeuroMeter’s micro-architecture model includes fundamental components of ML accelerators, including systolic array based tensor units (TU), reduction trees (RT), and 1D vector units (VU). NeuroMeter has accurate modeling results, with the average power and area estimation errors below 10% and 17% respectively when validated against TPU-v1, TPU-v2, and Eyeriss.Leveraging the NeuroMeter’s new capabilities on architecting manycore ML accelerators, this paper presents the first in-depth study on the design space and tradeoffs of “Brawny and Wimpy” inference accelerators in datacenter scenarios with the insights that are otherwise difficult to discover without NeuroMeter. Our study shows that brawny designs with 64x64 systolic arrays are the most performant and efficient for inference tasks in the 28nm datacenter architectural space with a 500mm2 die area budget. Our study also reveals important tradeoffs between performance and efficiency. For datacenter accelerators with low batch inference, a small $(sim 16$%) sacrifice of system performance (in achieved Tera OPerations per Second, aka TOPS) can lead to more than a 2x efficiency improvement (in achieved TOPS/TCO). To showcase NeuroMeter’s capability to model a wide range of diverse ML accelerator architectures, we also conduct a followon mini-case study on implications of sparsity on different ML accelerators, demonstrating wimpier accelerator architectures benefit more readily from sparsity processing despite their lower achievable raw energy efficiency.
机译:由于机器学习(ML)在人工智能时代变得普及,因此建筑研究需要ML具体工具和框架。本文介绍了ML加速器的神经仪,集成电源,区域和时序建模框架。 Neurometer模拟ML加速器的详细架构,并在电源,区域和芯片时机上产生快速准确的估算。同时,当提供运行时活动因素时,它还可以实现系统级性能和效率的运行时分析。神经表的微型建筑模型包括ML加速器的基本组件,包括基于收缩系阵列的张量单元(TU),还原树(RT)和1D矢量单元(VU)。神经仪具有精确的建模结果,当验证TPU-V1,TPU-V2和Eyeriss时,平均功率和面积估计误差分别为10%和17%。这篇论文呈现了神经仪的新功能。第一次深入研究“Brawny和WIMPY”推理加速器在数据中心方案中的设计空间和权衡,否则难以发现没有神经表的洞察力。我们的研究表明,具有64X64个收缩阵列的Briawny设计是28nm Datacenter建筑空间中的推理任务最表现,高效,具有500毫米 2 死亡区预算。我们的研究还揭示了性能和效率之间的重要权衡。对于具有低批次推理的数据中心加速器,Syment( SIM 16 $%)系统性能的牺牲(在达到每秒的TEA操作中,AKA顶部)可能导致超过2倍的效率改进(实现顶部/ TCO)。为了展示神经表来建模各种不同的ML加速器架构的能力,我们还开展了对不同ML加速器对不同ML加速器的稀疏性的迷你案例研究,尽管可实现的原始能源效率降低,但展示了Wimpier Accelerator架构更容易受益。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号