Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators

Carreras Marco; Deriu Gianfranco; Raffo Luigi; Benini Luca; Meloni Paolo

首页> 外文期刊>Emerging and Selected Topics in Circuits and Systems, IEEE Journal on >Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators

【24h】

Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators

机译：优化基于FPGA的加速器的时间卷积网络推断

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Convolutional Neural Networks (CNNs) are extensively used in a wide range of applications, commonly including computer vision tasks like image and video classification, recognition and segmentation. Recent research results demonstrate that multi-layer (deep) network involving mono-dimensional convolutions and dilation can be effectively used in time series and sequences classification and segmentation, as well as in tasks involving sequence modeling. These structures, commonly referred to as Temporal Convolutional Networks (TCNs), represent an extremely promising alternative to recurrent architectures, commonly used across a broad range of sequence modeling tasks. While FPGA based inference accelerators for classic CNNs are widespread, literature is lacking in a quantitative evaluation of their usability on inference for TCN models. In this paper we present such an evaluation, considering a CNN accelerator with specific features supporting TCN kernels as a reference and a set of state-of-the-art TCNs as a benchmark. Experimental results show that, during TCN execution, operational intensity can be critical for the overall performance. We propose a convolution scheduling based on batch processing that can boost efficiency up to 96% of theoretical peak performance. Overall we can achieve up to 111,8 GOPS/s and a power efficiency of 33,8 GOPS/s/W on an Ultrascale+ ZU3EG (up to 10x speedup and 3x power efficiency improvement with respect to pure software implementation).

机译：卷积神经网络（CNNS）广泛用于广泛的应用程序，通常包括图像和视频分类，识别和分段等计算机视觉任务。最近的研究结果表明，涉及单维卷积和扩张的多层（深）网络可以有效地以时间序列和序列分类和分割，以及涉及序列建模的任务。这些结构通常被称为时间卷积网络（TCN），代表了反复架构的极其有前途的替代方案，通常用于跨越广泛的序列建模任务。虽然基于FPGA的经典CNN的推论加速器是广泛的，但是在TCN模型的推理中的可用性的定量评估中缺乏文献。在本文中，考虑到具有支持TCN核的特定特征的CNN加速器作为参考的CNN加速器作为基准。实验结果表明，在TCN执行期间，操作强度对于整体性能至关重要。我们提出了一种基于批量处理的卷积调度，可以提高高达96％的理论峰值性能的效率。总体而言，我们可以在UltraScale + Zu3EG上实现高达111,8个GOP / s和33,8个GOP / S / W的功率效率（对于纯软件实现，高达10倍的加速和3x功率效率提高）。

著录项

来源
《Emerging and Selected Topics in Circuits and Systems, IEEE Journal on》 |2020年第3期|348-361|共14页
作者
Carreras Marco; Deriu Gianfranco; Raffo Luigi; Benini Luca; Meloni Paolo;
展开▼
作者单位

Univ Cagliari Dept Elect & Comp Engn I-09123 Cagliari Italy;

Univ Cagliari Dept Elect & Comp Engn I-09123 Cagliari Italy;

Univ Cagliari Dept Elect & Comp Engn I-09123 Cagliari Italy;

Univ Bologna Dept Elect Elect & Informat Engn I-40126 Bologna Italy|Swiss Fed Inst Technol Integrated Syst Lab CH-8092 Zurich Switzerland;

Univ Cagliari Dept Elect & Comp Engn I-09123 Cagliari Italy;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Field programmable gate arrays; Task analysis; Computer architecture; Acceleration; Kernel; Quantization (signal); Neural networks; Temporal convolutional network; TCN; hardware accelerator; FPGA; embedded systems;

机译：现场可编程门阵列;任务分析;计算机架构;加速;内核;量化（信号）;神经网络;时间卷积网络;TCN;硬件加速器;FPGA;嵌入式系统;嵌入式系统;嵌入式系统;

相似文献

外文文献
中文文献
专利

1. WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm [J] . Wang Xuan, Wang Chao, Cao Jing, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2020,第11期

机译：WINONN：使用稀疏Winograd算法优化基于FPGA的卷积神经网络加速器
2. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [J] . Shimoda Masayuki, Sada Youki, Nakahara Hiroki Journal of signal processing systems for signal, image, and video technology . 2021,第5期

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的百帘
3. A survey of FPGA-based accelerators for convolutional neural networks [J] . Neural computing & applications . 2020,第4期

机译：基于FPGA的卷积神经网络的加速器调查
4. Using Data Compression for Optimizing FPGA-Based Convolutional Neural Network Accelerators [C] . Yijin Guan, Ningyi Xu, Chen Zhang, International symposium on advanced parallel processing technologies . 2017

机译：使用数据压缩优化基于FPGA的卷积神经网络加速器
5. FPGA-based Accelerators for Convolutional Neural Networks on Embedded Devices [D] . Perera Miro, Jordi. 2020

机译：基于FPGA的嵌入式设备卷积神经网络的加速器
6. Optimization-based Inference for Temporally Evolving Networks with Applications in Biology [O] . Young Hwan Chang, Joe Gray, Claire Tomlin -1

机译：基于优化的临时进化网络及其在生物学中的应用
7. FPGA-Based Inter-layer Pipelined Accelerators for Filter-Wise Weight-Balanced Sparse Fully Convolutional Networks with Overlapped Tiling [O] . Masayuki Shimoda, Youki Sada, Hiroki Nakahara 2021

机译：基于FPGA的层间流水线加速器，用于滤波器的重量平衡的稀疏完全卷积网络，具有重叠的平铺

Optimizing Temporal Convolutional Network Inference on FPGA-Based Accelerators

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅