首页> 外文会议>International Conference on Field Programmable Logic and Applications >In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC
【24h】

In-Package Domain-Specific ASICs for Intel® Stratix® 10 FPGAs: A Case Study of Accelerating Deep Learning Using TensorTile ASIC

机译:英特尔®Stratix®10 FPGA的封装内特定域ASIC:使用TensorTile ASIC加速深度学习的案例研究

获取原文

摘要

FPGAs or ASICs? FPGAs are extremely flexible while ASICs offer top efficiency. We believe that FPGAs and ASICs are better together, to offer flexibility and efficiency. We propose single-package heterogeneous 2.5D integration of FPGAs and ASICs, using Intel's Embedded Multi-Die Interconnect Bridge (EMIB). Since the ASICs are separate chips from the FPGA, this approach (1) does not change FPGA fabric, allowing re-use of existing ecosystems (FPGA chips, packaging, boards, software), and (2) allows freedom in ASIC design (area/freq/process/etc unconstrained by FPGA fabric). Intel® Stratix® 10 FPGAs already have EMIBs, enabling single-package integration with other chips, or “tiles”. We propose leveraging them to mix-and-match any domain-specific ASICs with Stratix10 FPGAs. In particular, this work focuses on deep learning (DL) domain, which demands efficient tensor (matrix/vector) operations. We propose TensorTile ASICs for Stratix10 FPGAs to provide ASIC-level tensor performance, while relying on FPGA's flexibility for application-specific operations (e.g., Winograd). Our evaluation shows: (1) a small TensorTile offer much better tensor throughput than a large Stratix102800 FPGA; (2) FPGAs and TensorTiles mix-and-match provide scalable solutions (e.g., ~69 peak INT8 TOPs with 1×TensorTile+small Stratix10-400 FPGA, to ~194 peak FP16 TOPs with 6×TensorTiles+large Stratix10-2800); (3) AlexNet performance (performance/Watt) of Intel's DL FPGA design improved by 4× (3.3×) when enhanced with 2×TensorTiles.
机译:FPGA还是ASIC? FPGA具有极高的灵活性,而ASIC具有最高的效率。我们相信FPGA和ASIC更好地结合在一起,以提供灵活性和效率。我们建议使用英特尔的嵌入式多管芯互连桥(EMIB)进行FPGA和ASIC的单封装异构2.5D集成。由于ASIC是与FPGA分离的单独芯片,因此该方法(1)不会更改FPGA架构,允许重复使用现有的生态系统(FPGA芯片,封装,电路板,软件),并且(2)允许ASIC设计(区域)自由。 / freq / process / etc不受FPGA结构的限制)。英特尔®Stratix®10 FPGA已经具有EMIB,从而可以与其他芯片或“平铺”进行单封装集成。我们建议利用它们将任何特定领域的ASIC与Stratix10 FPGA混合搭配。特别地,这项工作专注于深度学习(DL)域,这需要有效的张量(矩阵/向量)操作。我们提出用于Stratix10 FPGA的TensorTile ASIC,以提供ASIC级张量性能,同时依靠FPGA的灵活性来进行特定于应用的操作(例如Winograd)。我们的评估显示:(1)小型TensorTile提供的张量吞吐量比大型Stratix102800 FPGA好得多; (2)FPGA和TensorTiles混搭提供可扩展的解决方案(例如,具有1×TensorTile +小型Stratix10-400 FPGA的〜69峰值INT8 TOP,至具有6×TensorTiles +大型Stratix10-2800的〜194峰值FP16 TOP); (3)借助2个TensorTiles增强后,英特尔DL FPGA设计的AlexNet性能(性能/瓦)提高了4倍(3.3倍)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号