首页> 外文会议>International Conference on VLSI Design;International Conference on Embedded Systems >TileNET: Scalable Architecture for High-Throughput Ternary Convolution Neural Networks Using FPGAs
【24h】

TileNET: Scalable Architecture for High-Throughput Ternary Convolution Neural Networks Using FPGAs

机译:TileNET:使用FPGA的高通量三元卷积神经网络的可扩展架构

获取原文

摘要

Convolution Neural Networks (CNNs) are becoming increasing popular in Advanced driver assistance systems (ADAS) and Autonomated driving (AD) for camera perception enabling multiple applications like object detection, lane detection and semantic segmentation. Ever increasing need for high resolution multiple cameras around car necessitates a huge-throughput in the order of about few 10's of TeraMACs per second (TMACS) along with high accuracy of detection. Existing implementations do not scale, with performance ranging only in the order of a few Giga operations per second. This paper, proposes a novel tiled architecture for CNNs that uses only ternarized weights, while input and output features are kept full precision resulting in minimal loss of accuracy. The proposed solution is implemented on Virtex-7 FPGA resulting in throughput of 13.76 TOPS. The post-implementation power simulation for AlexNet consumes 16 W, orders of magnitude lower than exist in GPUs.
机译:卷积神经网络(CNN)在用于摄像机感知的高级驾驶员辅助系统(ADAS)和自动驾驶(AD)中变得越来越流行,从而实现了对象检测,车道检测和语义分割等多种应用。汽车周围对高分辨率多摄像头的需求不断增长,因此需要每秒大约10的TeraMAC(TMACS)数量级的巨大吞吐量以及较高的检测精度。现有的实现无法扩展,其性能仅在每秒几个Giga操作的数量级范围内。本文提出了一种用于CNN的新颖平铺架构,该架构仅使用加权权重,而输入和输出功能则保持完整精度,从而将精度损失降至最低。所提出的解决方案在Virtex-7 FPGA上实现,从而实现了13.76 TOPS的吞吐量。 AlexNet的实施后功耗仿真消耗的功率为16 W,比GPU中的功耗低几个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号