首页> 外文会议>IEEE Symposium on Computer Arithmetic >Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product
【24h】

Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product

机译:英特尔神经网络神经网络处理器-T(NNP-T)融合浮点多点产品

获取原文

摘要

Intel’s Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 matrix multiplication every 32 cycles, accumulating the result in single precision (FP32). To reduce hardware costs, the MPU uses a fused many-term floating point dot product design with block alignment of the many input terms during addition, resulting in a unique datapath with several interesting design trade-offs. In this paper, we describe the details of the MPU pipeline, discuss the trade-offs made in the design, and present information on the accuracy of the computation as compared to traditional FMA implementations.
机译:英特尔的神经培训神经网络处理器(NNP-T)的核心包含先进的浮点点产品设计,以加速许多AI应用程序中发现的矩阵乘法运算。英特尔NNP-T上的每个矩阵处理单元(MPU)可以每32个周期处理32x32 BFloat16矩阵乘法,从而以单精度(FP32)累积结果。为了降低硬件成本,MPU使用融合的长期浮点点产品设计,在添加过程中对许多输入项进行块对齐,从而产生具有一些有趣的设计折衷的唯一数据路径。在本文中,我们描述了MPU管道的细节,讨论了设计中的取舍,并提供了与传统FMA实现相比计算精度的信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号