Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors

机译：深度学习处理器的高效定点/浮点合并混合精度乘法累加单元

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep learning is getting more and more attentions in recent years. Many hardware architectures have been proposed for efficient implementation of deep neural network. The arithmetic unit, as a core processing part of the hardware architecture, can determine the functionality of the whole architecture. In this paper, an efficient fixed/floating-point merged multiply-accumulate unit for deep learning processor is proposed. The proposed architecture supports 16-bit half-precision floating-point multiplication with 32-bit single-precision accumulation for training operations of deep learning algorithm. In addition, within the same hardware, the proposed architecture also supports two parallel 8-bit fixed-point multiplications and accumulating the products to 32-bit fixed-point number. This will enable higher throughput for inference operations of deep learning algorithms. Compared to a half-precision multiply-accumulate unit (accumulating to single-precision), the proposed architecture has only 4.6% area overhead. With the proposed multiply-accumulate unit, the deep learning processor can support both training and high-throughput inference.

机译：近年来，深度学习越来越受到关注。已经提出了许多用于深度神经网络的有效实现的硬件体系结构。算术单元作为硬件体系结构的核心处理部分，可以确定整个体系结构的功能。本文提出了一种高效的深度学习处理器的定/浮点合并乘法累加单元。所提出的体系结构支持16位半精度浮点乘法与32位单精度累加，用于深度学习算法的训练操作。此外，在同一硬件内，所提出的体系结构还支持两个并行的8位定点乘法，并将乘积累加为32位定点数。这将为深度学习算法的推理操作提供更高的吞吐量。与半精度乘法累加单元（累加到单精度）相比，所提出的体系结构仅具有4.6％的面积开销。借助拟议的乘法累加单元，深度学习处理器可以支持训练和高吞吐量推理。

著录项

来源
《IEEE International Symposium on Circuits and Systems》|2018年|1-5|共5页
会议地点
作者
Hao Zhang; Hyuk Jae Lee; Seok-Bum Ko;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Machine learning; Pipelines; Computer architecture; Training; Program processors; Hardware; Adders;

机译：机器学习;管道;计算机体系结构;培训;程序处理器;硬件;加法器;

相似文献

外文文献
中文文献
专利

1. MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning [J] . Shi Shaohuai, Chu Xiaowen, Li Bo IEEE Transactions on Parallel and Distributed Systems . 2021,第8期

机译：MG-WFBP：明智地合并梯度以获得分布式深度学习中的高效沟通
2. Area and power efficient pipelined hybrid merged adders for customized deep learning framework for FPGA implementation [J] . Kowsalya T. Microprocessors and microsystems . 2020,第Feba期

机译：适用于FPGA实现的定制深度学习框架的面积和功率高效流水线混合混合加法器
3. New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and Inference [J] . Giles Christopher E., Peterson Christina L., Heinrich Mark A. IEEE Transactions on Computers . 2020,第1期

机译：用于深度神经网络训练和推理的新灵活多精度乘积单元
4. Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors [C] . Hao Zhang, Hyuk Jae Lee, Seok-Bum Ko IEEE International Symposium on Circuits and Systems . 2018

机译：用于深度学习处理器的高效固定/浮点合并的混合精密乘积单元
5. Automating transformations from floating-point to fixed-point for implementing digital signal processing algorithms. [D] . Han, Kyungtae. 2006

机译：从浮点到定点的自动化转换，以实现数字信号处理算法。
6. Mixed-Precision Deep Learning Based on Computational Memory [O] . S. R. Nandakumar, Manuel Le Gallo, Christophe Piveteau, 2020

机译：基于计算记忆的混合精密深度学习
7. An Area-Efficient Standard-Cell Floating-Point Unit Design for a Processing-In-Memory System [O] . Joong-seok Moon, Taek-jun Kwon, Jeff Sondeen, 2003

机译：内存中处理系统的高效区域标准单元浮点单元设计

Efficient Fixed/Floating-Point Merged Mixed-Precision Multiply-Accumulate Unit for Deep Learning Processors

摘要

著录项

相似文献

相关主题

期刊订阅