High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

Wang Siqi; Ananthanarayanan Gayathri; Zeng Yifan; Goel Neeraj; Pathania Anuj; Mitra Tulika

首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

【24h】

High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

机译：嵌入式ARM大的高吞吐量CNN推断.Little多核处理器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Internet of Things edge intelligence requires convolutional neural network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance tradeoffs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput.

机译：事物互联网边缘智能需要卷积神经网络（CNN）推理在边缘设备本身中进行。 ARM Big.Little架构是在普遍的商业边缘设备的核心。它包括单个ISA异构核被分成多个均匀集群，使能量和性能权衡能够。所有核心都预计将在推理中同时使用以获得最大吞吐量。然而，从跨集群卷积核的并行化涉及的高通信开销是有害的。我们提出了一种称为管道的替代框架 - 它采用流水线设计，以跨集群分割卷积层，同时将它们各自的内核的并行化分配给分配的集群。我们开发一种性能预测模型，仅利用卷积层描述符来在所有允许的核心配置（类型和计数）上单独地预测每层的执行时间。管道 - 然后利用预测使用有效的设计空间探索算法创建平衡管道。管道平均导致吞吐量高出39％，而不是最高的前一种吞吐量。

著录项

来源
《IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems》 |2020年第10期|2254-2267|共14页
作者
Wang Siqi; Ananthanarayanan Gayathri; Zeng Yifan; Goel Neeraj; Pathania Anuj; Mitra Tulika;
展开▼
作者单位

Natl Univ Singapore Dept Comp Sci Sch Comp Singapore 117417 Singapore;

Indian Inst Technol Dharwad Dept Comp Sci & Engn Dharwad 580011 Karnataka India;

Natl Univ Singapore Dept Comp Sci Sch Comp Singapore 117417 Singapore;

Indian Inst Technol Ropar Dept Comp Sci & Engn Rupnagar 140001 India;

Natl Univ Singapore Dept Comp Sci Sch Comp Singapore 117417 Singapore;

Natl Univ Singapore Dept Comp Sci Sch Comp Singapore 117417 Singapore;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multicore processing; Throughput; Kernel; Pipelines; Integrated circuit modeling; Streaming media; Asymmetric multicore; convolutional neural network (CNN) performance-prediction; edge inference; heterogeneous multicore;

机译：多核处理;吞吐量;核;管道;集成电路建模;流媒体;不对称多核;卷积神经网络（CNN）性能预测;边缘推断;异构多核;

相似文献

外文文献
中文文献
专利

1. A synthesis of adaptive, low-power real-time embedded systems for ARM big.LITTLE technology [J] . Leszek CIOPINSKI, Stanislaw DENIZIAK Pomiary Automatyka Kontrola . 2015,第7期

机译：ARM big.LITTLE技术的自适应，低功耗实时嵌入式系统的综合
2. High-Throughput FFT-SPA Decoder Implementation for Non-Binary LDPC Codes on x86 Multicore Processors [J] . Le Gal Bertrand, Jego Christophe Journal of VLSI signal processing systems for signal, image, and video technology . 2020,第1期

机译：x86多核处理器上非二进制LDPC码的高吞吐量FFT-SPA解码器实现
3. SAMPLING, EMBEDDING AND INFERENCE FOR CARMA PROCESSES [J] . Brockwell Peter J., Lindner Alexander Journal of Time Series Analysis . 2019,第2期

机译：癌过程的采样，嵌入和推断
4. Combining Task- and Data-Level Parallelism for High-Throughput CNN Inference on Embedded CPUs-GPUs MPSoCs [C] . Svetlana Minakova, Erqian Tang, Todor Stefanov International conference on embedded computer systems: architectures, modeling and simulation . 2020

机译：在嵌入式CPU-GPUS MPSoC上结合任务和数据级并行性，用于高吞吐量CNN推断
5. Pre-processing and statistical inference methods for high-throughput genomic data with application to biomarker detection and regenerative medicine. [D] . Choi, Jeea. 2017

机译：高通量基因组数据的预处理和统计推断方法，应用于生物标志物检测和再生医学。
6. Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing [O] . Md Jubaer Hossain Pantho, Pankaj Bhowmik, Christophe Bobda 2021

机译：迈向有效的CNN推理架构实现了传感器处理
7. Full-System Simulation of big.LITTLE Multicore Architecture for Performance and Energy Exploration [O] . Butko, Anastasiia, Bruguier, Florent, Gamatié, Abdoulaye, 2016

机译：big.LITTLE多核体系结构的全系统仿真，用于性能和能源探索

High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅