RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network

Zhou Quan; Wang Jie; Liu Jia; Li Shenghua; Ou Weihua; Jin Xin

首页> 外文期刊>Mobile networks & applications >RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network

【24h】

RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network

机译：RSANET：朝着实时对象检测，具有残余语义引导的注意力功能金字塔网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

The huge computational overhead limits the inference of convolutional neural networks on mobile devices for object detection, which plays a critical role in many real-world scenes, such as face identification, autonomous driving, and video surveillance. To solve this problem, this paper introduces a lightweight convolutional neural network, called RSANet: Towards Real-time Object Detection with Residual Semantic-guided Attention Feature Pyramid Network. Our RSANet consists of two parts: (a) Lightweight Convolutional Network (LCNet) as backbone, and (b) Residual Semantic-guided Attention Feature Pyramid Network (RSAFPN) as detection head. In the LCNet, in contrast to recent advances of lightweight networks that prefer to utilize pointwise convolution for changing the number of feature maps, we design a Constant Channel Module (CCM) to save the Memory Access Cost (MAC) and design Down Sampling Module (DSM) to save the computational cost. In the RSAFPN, meanwhile, we employ Residual Semantic-guided Attention Mechanism (RSAM) to fuse the multi-scale features from LCNet for improving detection performance efficiently. The experiment results show that, on PASCAL VOC 20007 dataset, RSANet only requires 3.24 M model size and needs only 3.54B FLOPs with a 416x416 input image. Compared to YOLO Nano, our method obtains a 6.7% improvement in accuracy and requires less computation. On MS COCO dataset, RSANet only requires 4.35 M model size and needs only 2.34B FLOPs with a 320x320 input image. Our method obtains a 1.3% improvement in accuracy compared to Pelee. The comprehensive experiment results demonstrate that our model achieves promising results in terms of available speed and accuracy trade-off.

机译：巨大的计算开销限制了对象检测的移动设备上的卷积神经网络的推断，这在许多现实世界场景中起着关键作用，例如面部识别，自主驾驶和视频监控。为了解决这个问题，本文介绍了一种轻量级卷积神经网络，称为RSANET：朝着实时对象检测，具有残余语义引导的关注功能金字塔网络。我们的rsanet由两部分组成：（a）轻量级卷积网络（LCNET）作为骨干，（b）残余语义引导的注意力特征金字塔网（RSAFPN）作为检测头。在LCNET中，与近期轻量级网络的进步相比，优先利用点卷积来改变特征贴图的数量，我们设计一个恒定的通道模块（CCM），以保存内存访问成本（MAC）和设计下式采样模块（ DSM）以节省计算成本。同时，在RSAFPN中，我们采用了残余语义引导的注意机制（RSAM）来熔化LCNet的多尺度特征，以便有效地提高检测性能。实验结果表明，在Pascal VOC 20007数据集上，RSANET仅需要3.24米的型号，只需3.54b拖幅，带有416x416输入图像。与YOLO NANO相比，我们的方法精确获得了6.7％的提高，需要较少的计算。在MS Coco DataSet上，RSANET仅需要4.35米型号大小，仅需要2.34b拖幅，带有320x320输入图像。与Pelee相比，我们的方法的准确性提高了1.3％。全面的实验结果表明，我们的模型在可用速度和准确性权衡方面取得了有希望的结果。

著录项

来源
《Mobile networks & applications》 |2021年第1期|77-87|共11页
作者
Zhou Quan; Wang Jie; Liu Jia; Li Shenghua; Ou Weihua; Jin Xin;
展开▼
作者单位

Nanjing Univ Posts & Telecommun Coll Telecommun & Informat Engn Nanjing Peoples R China;

Nanjing Univ Posts & Telecommun Coll Telecommun & Informat Engn Nanjing Peoples R China;

Nanjing Univ Posts & Telecommun Coll Telecommun & Informat Engn Nanjing Peoples R China;

Nanjing Univ Posts & Telecommun Coll Telecommun & Informat Engn Nanjing Peoples R China;

Guizhou Normal Univ Sch Big Data & Comp Sci Guiyang Peoples R China;

Beijing Elect Sci & Technol Inst Dept Comp Sci & Technol Beijing Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Real-time; Object detection; Lightweight convolutinal network; Visual attention; FPN;

机译：实时;物体检测;轻量级卷漏网络;视觉注意;FPN;

RSANet: Towards Real-Time Object Detection with Residual Semantic-Guided Attention Feature Pyramid Network

摘要

著录项

引文网络

相关主题

期刊订阅