首页> 外文会议>IEEE International Symposium on Circuits and Systems >Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors
【24h】

Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors

机译:深度学习处理器的高效定点/浮点合并混合精度乘法累加单元

获取原文

摘要

Deep learning is getting more and more attentions in recent years. Many hardware architectures have been proposed for efficient implementation of deep neural network. The arithmetic unit, as a core processing part of the hardware architecture, can determine the functionality of the whole architecture. In this paper, an efficient fixed/floating-point merged multiply-accumulate unit for deep learning processor is proposed. The proposed architecture supports 16-bit half-precision floating-point multiplication with 32-bit single-precision accumulation for training operations of deep learning algorithm. In addition, within the same hardware, the proposed architecture also supports two parallel 8-bit fixed-point multiplications and accumulating the products to 32-bit fixed-point number. This will enable higher throughput for inference operations of deep learning algorithms. Compared to a half-precision multiply-accumulate unit (accumulating to single-precision), the proposed architecture has only 4.6% area overhead. With the proposed multiply-accumulate unit, the deep learning processor can support both training and high-throughput inference.
机译:近年来,深度学习越来越受到关注。已经提出了许多用于深度神经网络的有效实现的硬件体系结构。算术单元作为硬件体系结构的核心处理部分,可以确定整个体系结构的功能。本文提出了一种高效的深度学习处理器的定/浮点合并乘法累加单元。所提出的体系结构支持16位半精度浮点乘法与32位单精度累加,用于深度学习算法的训练操作。此外,在同一硬件内,所提出的体系结构还支持两个并行的8位定点乘法,并将乘积累加为32位定点数。这将为深度学习算法的推理操作提供更高的吞吐量。与半精度乘法累加单元(累加到单精度)相比,所提出的体系结构仅具有4.6%的面积开销。借助拟议的乘法累加单元,深度学习处理器可以支持训练和高吞吐量推理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号