Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning

机译：使用柱平衡块修剪加快FPGA的基于FPGA的深度学习模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although Transformer-based language representations achieve state-of-the-art accuracy on various natural language processing (NLP) tasks, the large model size has been challenging the resource constrained computing platforms. Weight pruning, as a popular and effective technique in reducing the number of weight parameters and accelerating the Transformer, has been investigated on GPUs. However, the Transformer acceleration using weight pruning on field-programmable gate array (FPGAs) remains unexplored. This paper investigates the column balanced block-wise pruning on Transformer and designs an FPGA acceleration engine to customize the balanced blockwise matrix multiplication. We implement the Transformer model with proper hardware scheduling, and the experiments show that the Transformer inference on FPGA achieves 10.35 ms latency with the batch size of 32, which is $10.96 imes$ speed up comparing to CPU platform and $2.08 imes$ speed up comparing to GPU platform.

机译：虽然基于变换器的语言表示实现了各种自然语言处理（NLP）任务的最先进的准确性，但大型型号大小一直挑战资源受限的计算平台。在GPU上研究了减少重量参数数量和加速变压器的流行和有效技术的重量修剪。然而，在现场可编程门阵列（FPGA）上使用权重修剪的变压器加速仍未探测。本文调查了变压器上的柱平衡块，并设计了FPGA加速引擎，以定制平衡块矩阵乘法。我们用适当的硬件调度实现变压器模型，实验表明，FPGA的变压器推断与批量大小为32的速度为10.35 ms延迟，这是与CPU平台相比的10.96倍$升级和2.08美元比较GPU平台。

著录项

来源
《International Symposium on Quality Electronic Design》|2021年|142-148|共7页
会议地点
作者
Hongwu Peng; Shaoyi Huang; Tong Geng; Ang Li; Weiwen Jiang; Hang Liu; Shusen Wang; Caiwen Ding;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Deep learning; Graphics processing units; Parallel processing; Hardware; Natural language processing; Sparse matrices;

机译：培训;深入学习;图形处理单元;并行处理;硬件;自然语言处理;稀疏矩阵;

相似文献

外文文献
中文文献
专利

1. Accelerate Scientific Deep Learning Models on Heterogeneous Computing Platform with FPGA [J] . Chao Jiang, David Ojika, Sofia Vallecorsa, EPJ Web of Conferences . 2020,第4期

机译：用FPGA加速在异构计算平台上的科学深度学习模型
2. FPGAs Accelerate Deep Learning [J] . Linley Gwennap Microprocessor report . 2017,第11期

机译：FPGA加速深度学习
3. A transformer-based deep learning model for recognizing communication-oriented entities from patents of ICT in construction [J] . Wu Hengqin, Shen Geoffrey Qiping, Lin Xue, Automation in construction . 2021,第May期

机译：一种基于变压器的深度学习模型，用于识别ICT专利施工的通信实体
4. Embracing Diversity: Enhanced DSP Blocks for Low-Precision Deep Learning on FPGAs [C] . Andrew Boutros, Sadegh Yazdanshenas, Vaughn Betz International Conference on Field Programmable Logic and Applications . 2018

机译：拥抱多样性：增强型DSP模块，用于FPGA上的低精度深度学习
5. FPGA Logic Block Architectures for Efficient Deep Learning Inference [D] . ?Eldafrawy, Mohamed Bahaaeldin Mohamed 2020

机译：FPGA逻辑块架构，用于高效的深度学习推论
6. Transformer-based deep neural network language models for Alzheimer’s disease risk assessment from targeted speech [O] . Alireza Roshanzamir, Hamid Aghajan, Mahdieh Soleymani Baghshah 2021

机译：基于变压器的深神经网络语言模型用于阿尔茨海默病风险评估来自目标言论
7. Accelerating Deep Neuroevolution on Distributed FPGAs for Reinforcement Learning Problems [O] . Alexis Asseman, Nicolas Antoine, Ahmet S. Ozcan 2021

机译：加速深度神经发展在分布式FPGA中加强学习问题

Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning

摘要

著录项

相似文献

相关主题

期刊订阅