A Dynamically Configurable Coprocessor for Convolutional Neural Networks

Srimat Chakradhar; Murugan Sankaradas; Venkata Jakkula; Srihari Cadambi

首页> 外文期刊>Computer architecture news >A Dynamically Configurable Coprocessor for Convolutional Neural Networks

【24h】

A Dynamically Configurable Coprocessor for Convolutional Neural Networks

机译：卷积神经网络的动态可配置协处理器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Convolutional neural networks (CNN) applications range from recognition and reasoning (such as handwriting recognition, facial expression recognition and video surveillance) to intelligent text applications such as semantic text analysis and natural language processing applications. Two key observations drive the design of a new architecture for CNN. First, CNN workloads exhibit a widely varying mix of three types of parallelism: parallelism within a convolution operation, intra-output parallelism where multiple input sources (features) are combined to create a single output, and inter-output parallelism where multiple, independent outputs (features) are computed simultaneously. Workloads differ significantly across different CNN applications, and across different layers of a CNN. Second, the number of processing elements in an architecture continues to scale (as per Moore's law) much faster than off-chip memory bandwidth (or pin-count) of chips. Based on these two observations, we show that for a given number of processing elements and off-chip memory bandwidth, a new CNN hardware architecture that dynamically configures the hardware on-the-fly to match the specific mix of parallelism in a given workload gives the best throughput performance. Our CNN compiler automatically translates high abstraction network specification into a parallel microprogram (a sequence of low-level VLIW instructions) that is mapped, scheduled and executed by the coprocessor. Compared to a 2.3 GHz quad-core, dual socket Intel Xeon, 1.35 GHz C870 GPU, and a 200 MHz FPGA implementation, our 120 MHz dynamically configurable architecture is 4x to 8x faster. This is the first CNN architecture to achieve real-time video stream processing (25 to 30 frames per second) on a wide range of object detection and recognition tasks.

机译：卷积神经网络（CNN）的应用范围从识别和推理（例如手写识别，面部表情识别和视频监视）到智能文本应用（例如语义文本分析和自然语言处理应用）。有两个主要观察结果推动了CNN新架构的设计。首先，CNN工作负载展现出三种并行性的广泛变化组合：卷积操作中的并行性，将多个输入源（功能）组合在一起以创建单个输出的内部输出并行性以及其中多个独立输出的输出间并行性（功能）是同时计算的。在不同的CNN应用程序之间以及在CNN的不同层之间，工作负载差异很大。其次，体系结构中处理元素的数量继续扩展（按照摩尔定律），远快于芯片的片外存储器带宽（或引脚数）。基于这两个观察，我们表明，对于给定数量的处理元件和片外内存带宽，一种新的CNN硬件架构可以动态地动态配置硬件以匹配给定工作负载中的并行度的特定混合，最佳的吞吐量性能。我们的CNN编译器自动将高抽象网络规范转换为并行微程序（一系列低级VLIW指令），该程序由协处理器映射，调度和执行。与2.3 GHz四核，双插槽Intel Xeon，1.35 GHz C870 GPU和200 MHz FPGA实施相比，我们的120 MHz动态可配置体系结构快4到8倍。这是第一个在各种对象检测和识别任务上实现实时视频流处理（每秒25到30帧）的CNN架构。

著录项

来源
《Computer architecture news》 |2010年第3期|P.247-257|共11页
作者
Srimat Chakradhar; Murugan Sankaradas; Venkata Jakkula; Srihari Cadambi;
展开▼
作者单位

NEC Laboratories America, Inc. 4 Independence Way, Princeton NJ 08540;

NEC Laboratories America, Inc. 4 Independence Way, Princeton NJ 08540;

NEC Laboratories America, Inc. 4 Independence Way, Princeton NJ 08540;

NEC Laboratories America, Inc. 4 Independence Way, Princeton NJ 08540;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
convolutional neural networks; dynamic reconfiguration; parallel computer architecture;

机译：卷积神经网络动态重新配置;并行计算机架构;

相似文献

外文文献
中文文献
专利

1. An energy-efficient deep convolutional neural networks coprocessor for multi-object detection [J] . Wu Yuancong, Wang J. J., Qian Kun, Microelectronics journal . 2020,第Apra期

机译：节能深度卷积神经网络协处理器用于多目标检测
2. An FPGA-Based Convolutional Neural Network Coprocessor [J] . Changpei Qiu, Xin’an Wang, Tianxia Zhao, Wireless communications & mobile computing . 2021,第a期

机译：基于FPGA的卷积神经网络协处理器
3. Electrokinetic confinement of axonal growth for dynamically configurable neural networks [J] . Thibault Honegger, Mark A. Scott, Mehmet F. Yanik Lab on a chip . 2013,第4期

机译：动态可配置神经网络的轴突生长的电动约束
4. A Dynamically Configurable Coprocessor for Convolutional Neural Networks [C] . Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, 37th annual international symposium on computer architecture 2010 . 2010

机译：卷积神经网络的动态可配置协处理器
5. Combining Convolutional Neural Networks and Graph Neural Networks for Image Classification [D] . Trivedy, Vivek. 2021

机译：结合卷积神经网络和图形神经网络的图像分类
6. Memristor Based Binary Convolutional Neural Network Architecture With Configurable Neurons [O] . Lixing Huang, Jietao Diao, Hongshan Nie, 2021

机译：基于Memristor的二元卷积神经网络架构可配置神经元
7. A Configurable Architecture for Running Hybrid Convolutional Neural Networks in Low-Density FPGAs [O] . Mario P. Vestias, Rui P. Duarte, Jose T. De Sousa, 2020

机译：一种可配置的架构，用于在低密度FPGA中运行混合卷积神经网络的可配置架构

A Dynamically Configurable Coprocessor for Convolutional Neural Networks

摘要

著录项

相似文献

相关主题

期刊订阅