首页> 外文会议>International Symposium on Quality Electronic Design >Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning
【24h】

Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning

机译:使用柱平衡块修剪加快FPGA的基于FPGA的深度学习模型

获取原文

摘要

Although Transformer-based language representations achieve state-of-the-art accuracy on various natural language processing (NLP) tasks, the large model size has been challenging the resource constrained computing platforms. Weight pruning, as a popular and effective technique in reducing the number of weight parameters and accelerating the Transformer, has been investigated on GPUs. However, the Transformer acceleration using weight pruning on field-programmable gate array (FPGAs) remains unexplored. This paper investigates the column balanced block-wise pruning on Transformer and designs an FPGA acceleration engine to customize the balanced blockwise matrix multiplication. We implement the Transformer model with proper hardware scheduling, and the experiments show that the Transformer inference on FPGA achieves 10.35 ms latency with the batch size of 32, which is $10.96 imes$ speed up comparing to CPU platform and $2.08 imes$ speed up comparing to GPU platform.
机译:虽然基于变换器的语言表示实现了各种自然语言处理(NLP)任务的最先进的准确性,但大型型号大小一直挑战资源受限的计算平台。 在GPU上研究了减少重量参数数量和加速变压器的流行和有效技术的重量修剪。 然而,在现场可编程门阵列(FPGA)上使用权重修剪的变压器加速仍未探测。 本文调查了变压器上的柱平衡块,并设计了FPGA加速引擎,以定制平衡块矩阵乘法。 我们用适当的硬件调度实现变压器模型,实验表明,FPGA的变压器推断与批量大小为32的速度为10.35 ms延迟,这是与CPU平台相比的10.96倍$升级和2.08美元 比较GPU平台。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号