首页> 外文会议>International Symposium on Computing and Networking Workshops >Horizontal division of deep learning applications with all-to-all communication on a multi-FPGA system
【24h】

Horizontal division of deep learning applications with all-to-all communication on a multi-FPGA system

机译:多-FPGA系统全面通信的深度学习应用水平分割

获取原文

摘要

Although convolutional neural networks (CNNs) have plenty of parallelism, traditional layer-by-layer task division designs for multi-FPGA systems have the following problems: (1) The computational load of each layer is different from each other, so the execution time is dominated with the heaviest one. (2) Each FPGA must be designed independently, it means that we must design, generate and manage various configuration files. To address this problem, we propose a horizontal division method that enables us to use of a single design for each FPGA. All layers are divided horizontal direction of the target CNN, and a set of layers is implemented on an FPGA. It reduces the time of design as well as management costs for the execution. Also, since the weight data can be separated, the usage of local memory can be reduced. The apparent disadvantage of this method is that it requires all-to-all data communication between FPGA boards, and so it is not suitable to traditional multi-FPGA systems with a simple linear network. Here, we tried to apply the method to FiC (Flow-in-Cloud) which has a powerful network to enable efficient broadcasting. A simple CNN LeNet and a matrix multiplication for more practical fully connected layer is implemented on the FiC prototype. As a result of the evaluation, LeNet using 8 FP-GAs achieved 7.5 times faster than that with a single FPGA, and achieved 12.6 times faster than the optimized software of a high-end CPU.
机译:虽然卷积神经网络(CNNS)具有充足的并行性,但是多个FPGA系统的传统层面任务划分设计具有以下问题:(1)每层的计算负载彼此不同,因此执行时间以最重的主导地位。 (2)每个FPGA必须独立设计,这意味着我们必须设计,生成和管理各种配置文件。为了解决这个问题,我们提出了一种水平分割方法,使我们能够为每个FPGA使用单一设计。所有层都被划分为目标CNN的水平方向,并且在FPGA上实现了一组层。它减少了设计的时间以及执行的管理成本。而且,由于可以分离权重数据,因此可以减少局部存储器的使用。该方法的明显缺点是它需要FPGA板之间的全面数据通信,因此它不适合具有简单线性网络的传统多FPGA系统。在这里,我们尝试将该方法应用于FIC(流入云),该方法具有强大的网络以实现有效的广播。在FIC原型上实现了一个简单的CNN LENET和用于更实际的完全连接层的矩阵乘法。作为评价,使用LeNet的结果8 FP气体实现比快7.5倍与单个FPGA,取得更快12.6倍比高端CPU的优化的软件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号