首页> 外文期刊>Journal of Real-Time Image Processing >Speeding up inference on deep neural networks for object detection by performing partial convolution
【24h】

Speeding up inference on deep neural networks for object detection by performing partial convolution

机译:通过执行部分卷积来加速对物体检测的深度神经网络的推断

获取原文
获取原文并翻译 | 示例

摘要

Real-time object detection is an expected application of deep neural networks (DNNs). It can be achieved by employing graphic processing units (GPUs) or dedicated hardware accelerators. Alternatively, in this work, we present a software scheme to accelerate the inference stage of DNNs designed for object detection. The scheme relies on partial processing within the consecutive convolution layers of a DNN. It makes use of different relationships between the locations of the components of an input feature, an intermediate feature representation, and an output feature to effectively identify the modified components. This downsizes the matrix multiplicand to cover only those modified components. Therefore, matrix multiplication is accelerated within a convolution layer. In addition, the aforementioned relationships can also be employed to signal the next consecutive convolution layer regarding the modified components. This further helps reduce the overhead of the comparison on a member-by-member basis to identify the modified components. The proposed scheme has been experimentally benchmarked against a similar concept approach, namely, CBinfer, and against the original Darknet on the Tiny-You Only Look Once network. The experiments were conducted on a personal computer with dual CPU running at 3.5 GHz without GPU acceleration upon video data sets from YouTube. The results show that improvement ratios of 1.56 and 13.10 in terms of detection frame rate over CBinfer and Darknet, respectively, are attainable on average. Our scheme was also extended to exploit GPU-assisted acceleration. The experimental results of NVIDIA Jetson TX2 reached a detection frame rate of 28.12 frames per second (1.25x with respect to CBinfer). The accuracy of detection of all experiments was preserved at 90% of the original Darknet.
机译:实时对象检测是深度神经网络(DNN)的预期应用。它可以通过采用图形处理单元(GPU)或专用硬件加速器来实现。或者,在这项工作中,我们提出了一种软件方案,以加速设计用于对象检测的DNN的推断阶段。该方案依赖于DNN连续卷积层内的部分处理。它利用输入特征的组件的位置之间的不同关系,中间特征表示和输出特征,以有效地识别修改的组件。这使得矩阵多平面缩小为仅覆盖那些修改的组件。因此,矩阵乘法在卷积层内加速。另外,还可以采用上述关系来用关于修改的组件的下一个连续的卷积层发信号信号。这进一步有助于减少成员对比较的开销,以识别修改的组件。拟议的计划已经通过实验对抗类似的概念方法,即CBINFER,并反对原始DarkNet在微小的情况下,你只看一下网络。实验在个人计算机上进行,其中双CPU在3.5 GHz运行,没有GPU加速在来自YouTube的视频数据集。结果表明,在CBInfer和Darknet的检测帧速率下,分别的改善比率为1.56和13.10,平均可实现。我们的计划还扩展以利用GPU辅助加速。 NVIDIA Jetson TX2的实验结果达到每秒28.12帧的检测帧速率(相对于CBINFER 1.25倍)。所有实验的检测准确性被保存在原始Darknet的90%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号