...
首页> 外文期刊>Computer architecture news >A Dynamically Configurable Coprocessor for Convolutional Neural Networks
【24h】

A Dynamically Configurable Coprocessor for Convolutional Neural Networks

机译:卷积神经网络的动态可配置协处理器

获取原文
获取原文并翻译 | 示例
           

摘要

Convolutional neural networks (CNN) applications range from recognition and reasoning (such as handwriting recognition, facial expression recognition and video surveillance) to intelligent text applications such as semantic text analysis and natural language processing applications. Two key observations drive the design of a new architecture for CNN. First, CNN workloads exhibit a widely varying mix of three types of parallelism: parallelism within a convolution operation, intra-output parallelism where multiple input sources (features) are combined to create a single output, and inter-output parallelism where multiple, independent outputs (features) are computed simultaneously. Workloads differ significantly across different CNN applications, and across different layers of a CNN. Second, the number of processing elements in an architecture continues to scale (as per Moore's law) much faster than off-chip memory bandwidth (or pin-count) of chips. Based on these two observations, we show that for a given number of processing elements and off-chip memory bandwidth, a new CNN hardware architecture that dynamically configures the hardware on-the-fly to match the specific mix of parallelism in a given workload gives the best throughput performance. Our CNN compiler automatically translates high abstraction network specification into a parallel microprogram (a sequence of low-level VLIW instructions) that is mapped, scheduled and executed by the coprocessor. Compared to a 2.3 GHz quad-core, dual socket Intel Xeon, 1.35 GHz C870 GPU, and a 200 MHz FPGA implementation, our 120 MHz dynamically configurable architecture is 4x to 8x faster. This is the first CNN architecture to achieve real-time video stream processing (25 to 30 frames per second) on a wide range of object detection and recognition tasks.
机译:卷积神经网络(CNN)的应用范围从识别和推理(例如手写识别,面部表情识别和视频监视)到智能文本应用(例如语义文本分析和自然语言处理应用)。有两个主要观察结果推动了CNN新架构的设计。首先,CNN工作负载展现出三种并行性的广泛变化组合:卷积操作中的并行性,将多个输入源(功能)组合在一起以创建单个输出的内部输出并行性以及其中多个独立输出的输出间并行性(功能)是同时计算的。在不同的CNN应用程序之间以及在CNN的不同层之间,工作负载差异很大。其次,体系结构中处理元素的数量继续扩展(按照摩尔定律),远快于芯片的片外存储器带宽(或引脚数)。基于这两个观察,我们表明,对于给定数量的处理元件和片外内存带宽,一种新的CNN硬件架构可以动态地动态配置硬件以匹配给定工作负载中的并行度的特定混合,最佳的吞吐量性能。我们的CNN编译器自动将高抽象网络规范转换为并行微程序(一系列低级VLIW指令),该程序由协处理器映射,调度和执行。与2.3 GHz四核,双插槽Intel Xeon,1.35 GHz C870 GPU和200 MHz FPGA实施相比,我们的120 MHz动态可配置体系结构快4到8倍。这是第一个在各种对象检测和识别任务上实现实时视频流处理(每秒25到30帧)的CNN架构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号