首页> 外文会议>IEEE Symposium on VLSI Circuits >A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference
【24h】

A 3.0 TFLOPS 0.62V Scalable Processor Core for High Compute Utilization AI Training and Inference

机译:3.0 TFLOPS 0.62V可扩展处理器内核,用于高计算利用率AI训练和推理

获取原文

摘要

A processor core is presented for AI training and inference products. Leading-edge compute efficiency is achieved for robust fp16 training via efficient heterogeneous 2-D systolic array-SIMD compute engines leveraging compact DLFloat16 FPUs. Architectural flexibility is maintained for very high compute utilization across neural network topologies. A modular dual-corelet architecture with a shared scratchpad and a software-controlled network/memory interface enables scalability to many-core SoCs and large-scale systems. The 14nm AI core achieves fp16 peak performance of 3.0 TFLOPS at 0.62V and 1.4 TFLOPS/W at 0.54V.
机译:提出了用于AI训练和推理产品的处理器核心。通过使用紧凑的DLFloat16 FPU的高效异构2D脉动阵列SIMD计算引擎,可以实现强大的fp16训练的领先计算效率。保持了体系结构的灵活性,以在整个神经网络拓扑中实现很高的计算利用率。具有共享暂存器和软件控制的网络/内存接口的模块化双核体系结构可实现对多核SoC和大规模系统的可扩展性。 14nm AI内核在0.62V时达到3.0 TFLOPS的fp16峰值性能,在0.54V时达到1.4 TFLOPS / W的fp16峰值性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号