NeuroMeter: An Integrated Power, Area, and Timing Modeling Framework for Machine Learning Accelerators Industry Track Paper

机译：神经仪：机器学习加速器行业轨道纸的集成电源，区域和时序建模框架

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As Machine Learning (ML) becomes pervasive in the era of artificial intelligence, ML specific tools and frameworks are required for architectural research. This paper introduces NeuroMeter, an integrated power, area, and timing modeling framework for ML accelerators. NeuroMeter models the detailed architecture of ML accelerators and generates a fast and accurate estimation on power, area, and chip timing. Meanwhile, it also enables the runtime analysis of system-level performance and efficiency when the runtime activity factors are provided. NeuroMeter’s micro-architecture model includes fundamental components of ML accelerators, including systolic array based tensor units (TU), reduction trees (RT), and 1D vector units (VU). NeuroMeter has accurate modeling results, with the average power and area estimation errors below 10% and 17% respectively when validated against TPU-v1, TPU-v2, and Eyeriss.Leveraging the NeuroMeter’s new capabilities on architecting manycore ML accelerators, this paper presents the first in-depth study on the design space and tradeoffs of “Brawny and Wimpy” inference accelerators in datacenter scenarios with the insights that are otherwise difficult to discover without NeuroMeter. Our study shows that brawny designs with 64x64 systolic arrays are the most performant and efficient for inference tasks in the 28nm datacenter architectural space with a 500mm^{2 die area budget. Our study also reveals important tradeoffs between performance and efficiency. For datacenter accelerators with low batch inference, a small $(sim 16$%) sacrifice of system performance (in achieved Tera OPerations per Second, aka TOPS) can lead to more than a 2x efficiency improvement (in achieved TOPS/TCO). To showcase NeuroMeter’s capability to model a wide range of diverse ML accelerator architectures, we also conduct a followon mini-case study on implications of sparsity on different ML accelerators, demonstrating wimpier accelerator architectures benefit more readily from sparsity processing despite their lower achievable raw energy efficiency.}

机译：由于机器学习（ML）在人工智能时代变得普及，因此建筑研究需要ML具体工具和框架。本文介绍了ML加速器的神经仪，集成电源，区域和时序建模框架。 Neurometer模拟ML加速器的详细架构，并在电源，区域和芯片时机上产生快速准确的估算。同时，当提供运行时活动因素时，它还可以实现系统级性能和效率的运行时分析。神经表的微型建筑模型包括ML加速器的基本组件，包括基于收缩系阵列的张量单元（TU），还原树（RT）和1D矢量单元（VU）。神经仪具有精确的建模结果，当验证TPU-V1，TPU-V2和Eyeriss时，平均功率和面积估计误差分别为10％和17％。这篇论文呈现了神经仪的新功能。第一次深入研究“Brawny和WIMPY”推理加速器在数据中心方案中的设计空间和权衡，否则难以发现没有神经表的洞察力。我们的研究表明，具有64X64个收缩阵列的Briawny设计是28nm Datacenter建筑空间中的推理任务最表现，高效，具有500毫米^{2 死亡区预算。我们的研究还揭示了性能和效率之间的重要权衡。对于具有低批次推理的数据中心加速器，Syment（ SIM 16 $％）系统性能的牺牲（在达到每秒的TEA操作中，AKA顶部）可能导致超过2倍的效率改进（实现顶部/ TCO）。为了展示神经表来建模各种不同的ML加速器架构的能力，我们还开展了对不同ML加速器对不同ML加速器的稀疏性的迷你案例研究，尽管可实现的原始能源效率降低，但展示了Wimpier Accelerator架构更容易受益。。}

著录项

来源
《IEEE International Symposium on High Performance Computer Architecture》|2021年|841-853|共13页
会议地点
作者
Tianqi Tang; Sheng Li; Lifeng Nai; Norm Jouppi; Yuan Xie;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Analytical models; Runtime; Accelerator architectures; Tools; Energy efficiency; Timing; Task analysis;

机译：分析模型;运行时;加速器架构;工具;能效;时间;任务分析;

相似文献

外文文献
中文文献
专利

1. Enabling Timing Error Resilience for Low-Power Systolic-Array Based Deep Learning Accelerators [J] . Zhang Jeff, Ghodsi Zahra, Garg Siddharth, IEEE Design & Test of Computers Magazine . 2020,第2期

机译：基于低功耗的Systolic-阵列的深度学习加速器启用定时误差弹性
2. A novel robust ensemble model integrated extreme learning machine with multi-activation functions for energy modeling and analysis: Application to petrochemical industry [J] . Zhang Xiao-Han, Zhu Qun-Xiong, He Yan-Lin, Energy . 2018,第NOVa1期

机译：新型的具有多激活功能的鲁棒集成模型集成极限学习机，用于能量建模和分析：在石油化工行业中的应用
3. (106422)Integrated and intelligent design framework for cemented paste backfill: A combination of robust machine learning modelling and multi-objective optimization [J] . Minerals Engineering . 2020,第期

机译：（106422）巩固粘贴回填的集成和智能设计框架：强大的机器学习建模和多目标优化的组合
4. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures [C] . Li Sheng, Ahn Jung Ho, Strong Richard D., Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture . 2009

机译：McPAT：用于多核和多核架构的集成电源，区域和时序建模框架
5. An integrated power, area, and timing modeling framework for the design of multithreaded and multi/manycore architectures. [D] . Li, Sheng. 2010

机译：一个集成的电源，区域和时序建模框架，用于设计多线程和多/多核架构。
6. Interpretable Machine Learning Models for Three-Way Classification of Cognitive Workload Levels for Eye-Tracking Features [O] . Monika Kaczorowska, Małgorzata Plechawska-Wójcik, Mikhail Tokovarov 2021

机译：可解释的机器学习模型用于追踪功能的认知工作量水平的三元分类
7. Beam data modeling of linear accelerators (linacs) through machine learning and its potential applications in fast and robust linac commissioning and quality assurance [O] . Wei Zhao, Ishan Patil, Bin Han, 2020

机译：线性加速器（LINACS）通过机器学习和潜在应用在快速和强大的LINAC调试和质量保证中的光束数据建模

NeuroMeter: An Integrated Power, Area, and Timing Modeling Framework for Machine Learning Accelerators Industry Track Paper

摘要

著录项

相似文献

相关主题

期刊订阅