【24h】

Fast Depthwise Separable Convolution for Embedded Systems

机译:嵌入式系统的快速深度可分离卷积

获取原文

摘要

Convolutional neural networks (CNNs) have achieved outstanding performance in many applications. However, as the total number of layers has increased and the model structure has become compound, the computational cost comes into question. The large models cannot operate in embedded or mobile environments where hardware resources are quite limited. To overcome these problems, there have been several attempts like reducing the depth of networks, pruning, quantization or low rank approximation. Depthwise separable convolution (DSC) was proposed to reduce computation especially in convolutional layers by separating one convolution into a spatial convolution and a point-wise convolution. In this paper, we apply DSC to the YOLO network for object detection and propose a faster version of DSC, FastDSC by replacing the pointwise convolution with general matrix multiplication. Experiments on the NVIDIA Jetson TX2 board show that FastDSC speeds up DSC for object detection.
机译:卷积神经网络(CNN)在许多应用中都取得了出色的性能。但是,随着层总数的增加和模型结构的复杂化,计算成本成为问题。大型模型无法在硬件资源十分有限的嵌入式或移动环境中运行。为了克服这些问题,已经进行了一些尝试,例如减小网络深度,修剪,量化或低秩逼近。提出了深度可分离卷积(DSC),以通过将一个卷积分为空间卷积和点卷积来减少计算,尤其是在卷积层中。在本文中,我们将DSC应用于YOLO网络以进行对象检测,并提出了一种通用版本的矩阵乘法来代替点向卷积,从而提出了DSC的更快版本FastDSC。 NVIDIA Jetson TX2板上的实验表明,FastDSC可以加快DSC进行物体检测的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号