首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture
【24h】

High-Performance CNN Accelerator on FPGA Using Unified Winograd-GEMM Architecture

机译:使用统一Winograd-GEMM架构的FPGA上的高性能CNN加速器

获取原文
获取原文并翻译 | 示例

摘要

Deep neural networks have revolutionized a variety of applications in varying domains like autonomous vehicles, weather forecasting, cancer detection, surveillance, traffic management, and so on. The convolutional neural network (CNN) is the state-of-the-art technique for many machine learning tasks in the image and video processing domains. Deployment of CNNs on embedded systems with lower processing power and smaller power budget is a challenging task. Recent studies have shown the effectiveness of field-programmable gate array (FPGA) as a hardware accelerator for the CNNs that can deliver high performance at low power budgets. Majority of computations in CNNs involve 2-D convolution. Winograd minimal filtering-based algorithm is the most efficient technique for calculating convolution for smaller filter sizes. CNNs also consist of fully connected layers that are computed using general element-wise matrix multiplication (GEMM). In this article, we propose a unified architecture named UniWiG, where bothWinograd-based convolution and GEMM can be accelerated using the same set of processing elements. This approach leads to efficient utilization of FPGA hardware resources while computing all layers in the CNN. The proposed architecture shows performance improvement in the range of 1.4x to 4.02x with only 13% additional FPGA resources with respect to the baseline GEMM-based architecture. We have mapped popular CNN models like AlexNet and VGG-16 onto the proposed accelerator and the measured performance compares favorably with other state-of-the-art implementations. We have also analyzed the vulnerability of the accelerator to the side-channel attacks. Preliminary investigations show that the UniWiG architecture is more robust to memory side-channel attacks than direct convolution-based techniques.
机译:深度神经网络彻底改变了自动驾驶,天气预报,癌症检测,监视,交通管理等领域的各种应用。卷积神经网络(CNN)是用于图像和视频处理领域中许多机器学习任务的最新技术。在具有较低处理能力和较小功率预算的嵌入式系统上部署CNN是一项艰巨的任务。最近的研究表明,现场可编程门阵列(FPGA)作为CNN的硬件加速器的有效性,可以在低功耗预算下提供高性能。 CNN中的大多数计算都涉及二维卷积。 Winograd基于最小过滤的算法是用于为较小的过滤器大小计算卷积的最有效技术。 CNN还包含完全连接的图层,这些图层使用通用的逐元素矩阵乘法(GEMM)计算。在本文中,我们提出了一个名为UniWiG的统一体系结构,其中可以使用同一组处理元素来加速基于Winograd的卷积和GEMM。这种方法可在计算CNN中的所有层时有效利用FPGA硬件资源。相对于基于GEMM的基础架构,所提出的架构显示性能提高了1.4倍至4.02倍,而FPGA资源仅增加了13%。我们已经将流行的CNN模型(如AlexNet和VGG-16)映射到了拟议的加速器上,并且所测得的性能与其他最新的实现方式相比具有优势。我们还分析了加速器对侧通道攻击的脆弱性。初步研究表明,与基于卷积的直接技术相比,UniWiG架构对内存侧通道攻击更健壮。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号