首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors
【24h】

High-Throughput CNN Inference on Embedded ARM Big.LITTLE Multicore Processors

机译:嵌入式ARM大的高吞吐量CNN推断.Little多核处理器

获取原文
获取原文并翻译 | 示例
           

摘要

Internet of Things edge intelligence requires convolutional neural network (CNN) inference to take place in the edge devices itself. ARM big.LITTLE architecture is at the heart of prevalent commercial edge devices. It comprises of single-ISA heterogeneous cores grouped into multiple homogeneous clusters that enable power and performance tradeoffs. All cores are expected to be simultaneously employed in inference to attain maximal throughput. However, high communication overhead involved in parallelization of computations from convolution kernels across clusters is detrimental to throughput. We present an alternative framework called Pipe-it that employs pipelined design to split convolutional layers across clusters while limiting parallelization of their respective kernels to the assigned cluster. We develop a performance-prediction model that utilizes only the convolutional layer descriptors to predict the execution time of each layer individually on all permitted core configurations (type and count). Pipe-it then exploits the predictions to create a balanced pipeline using an efficient design space exploration algorithm. Pipe-it on average results in a 39% higher throughput than the highest antecedent throughput.
机译:事物互联网边缘智能需要卷积神经网络(CNN)推理在边缘设备本身中进行。 ARM Big.Little架构是在普遍的商业边缘设备的核心。它包括单个ISA异构核被分成多个均匀集群,使能量和性能权衡能够。所有核心都预计将在推理中同时使用以获得最大吞吐量。然而,从跨集群卷积核的并行化涉及的高通信开销是有害的。我们提出了一种称为管道的替代框架 - 它采用流水线设计,以跨集群分割卷积层,同时将它们各自的内核的并行化分配给分配的集群。我们开发一种性能预测模型,仅利用卷积层描述符来在所有允许的核心配置(类型和计数)上单独地预测每层的执行时间。管道 - 然后利用预测使用有效的设计空间探索算法创建平衡管道。管道平均导致吞吐量高出39%,而不是最高的前一种吞吐量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号