首页> 外文会议>IEEE Symposium on Computer Applications and Industrial Electronics >Concurrent MAC unit design using VHDL for deep learning networks on FPGA
【24h】

Concurrent MAC unit design using VHDL for deep learning networks on FPGA

机译:在FPGA上使用VHDL进行深度学习网络的并行MAC单元设计

获取原文

摘要

Deep neural network algorithms have proven their enormous capabilities in wide range of artificial intelligence applications, specially in Printed/Handwritten text recognition, Multimedia processing, Robotics and many other high end technological trends. The most challenging aspect nowadays is to overcome the extremely computational processing demands in applying such algorithms, especially in real-time systems. Recently, the Field Programmable Gate Array (FPGA) has been considered as one of the optimum hardware accelerator platform for accelerating the deep neural network architectures due to its large adaptability and the high degree of parallelism it offers. In this paper, the proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture aimed to create a fully-customize MAC unit for the Convolutional Neural Networks (CNN) instead of depending on the conventional DSP blocks and embedded memories units on the FPGAs architecture silicon fabrics. The proposed 8-bits fixed-point parallel multiply-accumulate (MAC) unit architecture is designed using VHDL language and can performs a computational speed up to 4.17 Giga Operation per Second (GOPS) using high-density FPGAs.
机译:深度神经网络算法已在各种人工智能应用中证明了其巨大的功能,特别是在印刷/手写文本识别,多媒体处理,机器人技术和许多其他高端技术趋势中。如今,最具挑战性的方面是克服在应用此类算法时(特别是在实时系统中)的极高的计算处理需求。最近,由于它的大的适应性和高度的并行性,现场可编程门阵列(FPGA)被认为是用于加速深度神经网络体系结构的最佳硬件加速器平台之一。在本文中,提出的8位定点并行乘法累加(MAC)单元架构旨在为卷积神经网络(CNN)创建完全自定义的MAC单元,而不是依赖于传统的DSP块和嵌入式存储器单元在FPGA体系结构硅架构上。拟议的8位定点并行乘法累加(MAC)单元架构是使用VHDL语言设计的,并且使用高密度FPGA可以执行高达4.17每秒千兆位运算(GOPS)的计算速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号