High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

Kala S.; Jose Babita R.; Mathew Jimson; Nalesh S.

首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

【24h】

High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

机译：使用统一Winograd-GEMM架构的FPGA上的高性能CNN加速器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep neural networks have revolutionized a variety of applications in varying domains like autonomous vehicles, weather forecasting, cancer detection, surveillance, traffic management, and so on. The convolutional neural network (CNN) is the state-of-the-art technique for many machine learning tasks in the image and video processing domains. Deployment of CNNs on embedded systems with lower processing power and smaller power budget is a challenging task. Recent studies have shown the effectiveness of field-programmable gate array (FPGA) as a hardware accelerator for the CNNs that can deliver high performance at low power budgets. Majority of computations in CNNs involve 2-D convolution. Winograd minimal filtering-based algorithm is the most efficient technique for calculating convolution for smaller filter sizes. CNNs also consist of fully connected layers that are computed using general element-wise matrix multiplication (GEMM). In this article, we propose a unified architecture named UniWiG, where bothWinograd-based convolution and GEMM can be accelerated using the same set of processing elements. This approach leads to efficient utilization of FPGA hardware resources while computing all layers in the CNN. The proposed architecture shows performance improvement in the range of 1.4x to 4.02x with only 13% additional FPGA resources with respect to the baseline GEMM-based architecture. We have mapped popular CNN models like AlexNet and VGG-16 onto the proposed accelerator and the measured performance compares favorably with other state-of-the-art implementations. We have also analyzed the vulnerability of the accelerator to the side-channel attacks. Preliminary investigations show that the UniWiG architecture is more robust to memory side-channel attacks than direct convolution-based techniques.

机译：深度神经网络彻底改变了自动驾驶，天气预报，癌症检测，监视，交通管理等领域的各种应用。卷积神经网络（CNN）是用于图像和视频处理领域中许多机器学习任务的最新技术。在具有较低处理能力和较小功率预算的嵌入式系统上部署CNN是一项艰巨的任务。最近的研究表明，现场可编程门阵列（FPGA）作为CNN的硬件加速器的有效性，可以在低功耗预算下提供高性能。 CNN中的大多数计算都涉及二维卷积。 Winograd基于最小过滤的算法是用于为较小的过滤器大小计算卷积的最有效技术。 CNN还包含完全连接的图层，这些图层使用通用的逐元素矩阵乘法（GEMM）计算。在本文中，我们提出了一个名为UniWiG的统一体系结构，其中可以使用同一组处理元素来加速基于Winograd的卷积和GEMM。这种方法可在计算CNN中的所有层时有效利用FPGA硬件资源。相对于基于GEMM的基础架构，所提出的架构显示性能提高了1.4倍至4.02倍，而FPGA资源仅增加了13％。我们已经将流行的CNN模型（如AlexNet和VGG-16）映射到了拟议的加速器上，并且所测得的性能与其他最新的实现方式相比具有优势。我们还分析了加速器对侧通道攻击的脆弱性。初步研究表明，与基于卷积的直接技术相比，UniWiG架构对内存侧通道攻击更健壮。

著录项

来源
《IEEE transactions on very large scale integration (VLSI) systems》 |2019年第12期|2816-2828|共13页
作者
Kala S.; Jose Babita R.; Mathew Jimson; Nalesh S.;
展开▼
作者单位

Cochin Univ Sci & Technol Sch Engn Div Elect & Commun Engn Kochi 682022 Kerala India;

IIT Patna Dept Comp Sci & Engn Patna 801103 Bihar India;

Cochin Univ Sci & Technol Dept Elect Kochi 682022 Kerala India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Convolutional neural networks (CNNs); field-programmable gate arrays (FPGAs); hardware accelerators; side-channel attacks;

机译：卷积神经网络（CNN）;现场可编程门阵列（FPGA）;硬件加速器;旁道攻击;

相似文献

外文文献
中文文献
专利

1. Automatic Compilation of Diverse CNNs Onto High-Performance FPGA Accelerators [J] . Ma Yufei, Cao Yu, Vrudhula Sarma, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第2期

机译：将各种CNN的自动编译在高性能FPGA加速器上
2. IMORC: An infrastructure and architecture template for implementing high-performance reconfigurable FPGA accelerators [J] . Tobias Schumacher, Christian Plessl, Marco Platzner Microprocessors and microsystems . 2012,第2期

机译：IMORC：用于实现高性能可重构FPGA加速器的基础架构和体系结构模板
3. High-Performance FPGA-Based CNN Accelerator With Block-Floating-Point Arithmetic [J] . Lian Xiaocong, Liu Zhenyu, Song Zhourui, IEEE transactions on very large scale integration (VLSI) systems . 2019,第8期

机译：具有块浮点算法的基于FPGA的高性能CNN加速器
4. UniWiG: Unified Winograd-GEMM Architecture for Accelerating CNN on FPGAs [C] . Kala S, Jimson Mathew, Babita R Jose, International Conference on VLSI Design;International Conference on Embedded Systems . 2019

机译：UniWiG：统一的Winograd-GEMM架构，用于加速FPGA上的CNN
5. Development NURO-RAM: Memory Management Architecture for Streaming CNN Accelerators on Edge [D] . Sawant, Adarsh Navath. 2019

机译：开发NURO-RAM：用于在边缘流式传输CNN加速器的内存管理架构
6. Families of FPGA-Based Accelerators for Approximate String Matching [O] . Tom Van Court, Martin C. Herbordt -1

机译：基于FPGA的加速器家族用于近似字符串匹配
7. IMORC: An Infrastructure and Architecture Template for Implementing High-Performance Reconfigurable FPGA Accelerators [O] . Tobias Schumacher, Christian Plessl, Marco Platzner 2014

机译：IMORC：用于实现高性能可重配置FPGA加速器的基础架构和体系结构模板

High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅