Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC

机译：TFLITE-SOC的加速器设计空间探索和端到端DNN评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently there has been a rapidly growing demand for faster machine learning (ML) processing in data centers and migration of ML inference applications to edge devices. These developments have prompted both industry and academia to explore custom accelerators to optimize ML executions for performance and power. However, identifying which accelerator is best equipped for performing a particular ML task is challenging, especially given the growing range of ML tasks, the number of target environments, and the limited number of integrated modeling tools. To tackle this issue, it is of paramount importance to provide the computer architecture research community with a common framework capable of performing a comprehensive, uniform, and fair comparison across different accelerator designs targeting a particular ML task. To this aim, we propose a new framework named TFLITE-SOC (System On Chip) that integrates a lightweight system modeling library (SystemC) for fast design space exploration of custom ML accelerators into the build/execution environment of Tensorflow Lite (TFLite), a highly popular ML framework for ML inference. Using this approach, we are able to model and evaluate new accelerators developed in SystemC by leveraging the language's hierarchical design capabilities, resulting in faster design prototyping. Furthermore, any accelerator designed using TFLITE-SOC can be benchmarked for inference with any DNN model compatible with TFLite, which enables end-to-end DNN processing and detailed (i.e., per DNN layer) performance analysis. In addition to providing rapid prototyping, integrated benchmarking, and a range of platform configurations, TFLITE-SOC offers comprehensive performance analysis of accelerator occupancy and execution time breakdown as well as a rich set of modules that can be used by new accelerators to implement scaling up studies and optimized memory transfer protocols. We present our framework and demonstrate its utility by considering the design space of a TPU-like systolic array and describing possible directions for optimization. Using a compression technique, we implement an optimization targeting reducing the memory traffic between DRAM and on-device buffers. Compared to the baseline accelerator, our optimized design shows up to 1.26x speedup on accelerated operations and up to 1.19x speedup on end-to-end DNN execution.

机译：最近，对数据中心中更快的机器学习（ML）处理以及将ML推理应用程序迁移到边缘设备的需求迅速增长。这些发展促使业界和学术界都在探索定制的加速器，以优化ML执行的性能和功能。但是，要确定哪种加速器最适合执行特定的ML任务具有挑战性，特别是考虑到ML任务的范围不断扩大，目标环境的数量以及集成建模工具的数量有限的情况。为了解决这个问题，为计算机体系结构研究界提供一个通用框架，使其能够跨针对特定ML任务的不同加速器设计执行全面，统一和公平的比较，这一点至关重要。为此，我们提出了一个名为TFLITE-SOC（片上系统）的新框架，该框架集成了轻量级系统建模库（SystemC），用于将自定义ML加速器的快速设计空间探索到Tensorflow Lite（TFLite）的构建/执行环境中，一个用于ML推理的非常流行的ML框架。使用这种方法，我们能够利用语言的分层设计功能对SystemC中开发的新加速器进行建模和评估，从而加快设计原型的速度。此外，使用TFLITE-SOC设计的任何加速器都可以通过与TFLite兼容的任何DNN模型进行基准测试，以进行端到端DNN处理和详细的性能分析（即每个DNN层）。除了提供快速的原型制作，集成的基准测试和各种平台配置之外，TFLITE-SOC还提供加速器占用率和执行时间分解的全面性能分析，以及丰富的模块集，新加速器可以使用这些模块来实现扩展研究并优化了内存传输协议。通过介绍类似TPU的脉动阵列的设计空间并描述可能的优化方向，我们介绍了我们的框架并展示了其实用性。使用压缩技术，我们实现了优化目标，以减少DRAM与设备内缓冲区之间的内存通信量。与基准加速器相比，我们的优化设计在加速操作时显示出高达1.26倍的加速，在端到端DNN执行时显示出高达1.19倍的加速。

著录项

来源
《IEEE International Symposium on Computer Architecture and High Performance Computing》|2020年|10-19|共10页
会议地点
作者
Nicolas Bohm Agostini; Shi Dong; Elmira Karimi; Marti Torrents Lapuerta; José Cano; José L. Abellán; David Kaeli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
DNN accelerator framework, Systolic array, Memory compression, Hardware-software co-design;

机译：DNN加速器框架，脉动阵列，内存压缩，软硬件协同设计;

相似文献

外文文献
中文文献
专利

1. SuperSlash: A Unified Design Space Exploration and Model Compression Methodology for Design of Deep Learning Accelerators With Reduced Off-Chip Memory Access Volume [J] . Ahmad Hazoor, Arif Tabasher, Hanif Muhammad Abdullah, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第11期

机译：Superslash：统一的设计空间探索和模型压缩方法，用于设计减少的芯片内存访问卷的深度学习加速器
2. FEECA: Design Space Exploration for Low-Latency and Energy-Efficient Capsule Network Accelerators [J] . Marchisio Alberto, Mrazek Vojtech, Hanif Muhammad Abdullah, IEEE transactions on very large scale integration (VLSI) systems . 2021,第4期

机译：FEECA：低延迟和节能胶囊网络加速器的设计空间探索
3. Energy-driven design space exploration of tiling-based accelerators for heterogeneous multiprocessor architectures [J] . Roux Baptiste, Gautier Matthieu, Sentieys Olivier, Microprocessors and microsystems . 2020,第Sepa期

机译：用于异构多处理器架构的平铺加速器的能源驱动设计空间探索
4. Sparseloop: An Analytical, Energy-Focused Design Space Exploration Methodology for Sparse Tensor Accelerators [C] . Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, IEEE International Symposium on Performance Analysis of Systems and Software . 2021

机译：Sparseloop：稀疏张量加速器的分析，能量聚焦的设计空间探索方法
5. Design Space Exploration of Accelerators for Warehouse Scale Computing [D] . Lottarini, Andrea. 2019

机译：仓库规模计算加速器的设计空间探索
6. Exploration of a Capability-Focused Aerospace System of Systems Architecture Alternative with Bilayer Design Space Based on RST-SOM Algorithmic Methods [O] . Zhifei Li, Dongliang Qin, Feng Yang -1

机译：基于RST-SOM算法的以双层设计空间为中心的以系统架构替代能力为重点的航空航天系统的探索
7. Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC [O] . Nicolas Bohm Agostini, Shi Dong, Elmira Karimi, 2020

机译：用TFLITE-SOC设计加速器和端到端DNN评估的设计空间探索
8. SpaceTree: Supporting Exploration in Large Node Link Tree, Design Evolution and Empirical Evaluation [R] . Plaisant, C. , Grosjean, J. , Bedersonn, B. B. 2002

机译：spaceTree：支持大节点链接树的探索，设计演化和实证评估

Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC

摘要

著录项

相似文献

相关主题

期刊订阅