首页> 外文会议>International Conference on Pattern Recognition >Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting
【24h】

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

机译:基于编码器解码器的卷积神经网络,具有用于人群计数的多尺度感知模块

获取原文

摘要

In this paper, we propose two modified neural networks based on dual path multi-scale fusion networks (SFANet) and SegNet for accurate and efficient crowd counting. Inspired by SFANet, the first model, which is named M-SFANet, is attached with atrous spatial pyramid pooling (ASPP) and context-aware module (CAN). The encoder of M-SFANet is enhanced with ASPP containing parallel atrous convolutional layers with different sampling rates and hence able to extract multi-scale features of the target object and incorporate larger context. To further deal with scale variation throughout an input image, we leverage the CAN module which adaptively encodes the scales of the contextual information. The combination yields an effective model for counting in both dense and sparse crowd scenes. Based on the SFANet decoder structure, M-SFANet‘s decoder has dual paths, for density map and attention map generation. The second model is called M-SegNet, which is produced by replacing the bilinear upsampling in SFANet with max unpooling that is used in SegNet. This change provides a faster model while providing competitive counting performance. Designed for high-speed surveillance applications, M-SegNet has no additional multi-scale-aware module in order to not increase the complexity. Both models are encoder-decoder based architectures and are end-to-end trainable. We conduct extensive experiments on five crowd counting datasets and one vehicle counting dataset to show that these modifications yield algorithms that could improve state-of-the-art crowd counting methods. Codes are available at https://github.com/Pongpisit-Thanasutives/Variations-of-SFANet-for-Crowd-Counting.
机译:在本文中,我们提出了基于双路径多尺度融合网络(SFANET)和SEGNET的修改后的神经网络,以准确和高效的人群计数。灵感来自SFANET,该名为M-SFANET的第一个模型附加了占空空间金字塔池(ASPP)和上下文感知模块(CAN)。 M-SFANET的编码器通过包含不同采样率的并联区域卷积层而增强,因此能够提取目标对象的多尺度特征并包含较大的上下文。为了进一步处理整个输入图像的尺度变化,我们利用了可自适应地编码上下文信息的尺度的CAN模块。该组合产生了一种有效的模型,用于在密集和稀疏的人群场景中计算。基于SFANET解码器结构,M-SFANET的解码器具有双路径,用于密度图和注意地图生成。第二种模型称为M-SEGNET,它是通过用SEGNET中使用的MAX未脱井中使用的SFANET中的BILINEAR UPSPLING来制作。此更改提供了更快的模型,同时提供竞争性计数性能。 M-SEGNET专为高速监控应用而设计,没有额外的多尺度感知模块,以便不会增加复杂性。这两种模型都是基于编码器解码器的体系结构,并且是端到端的培训。我们在五个人群计数数据集和一个车辆计数数据集中进行广泛的实验,以表明这些修改可以提高最先进的人群计数方法的产量算法。代码在https://github.com/pongpisit-thanasutives/variations-of-sfanet-for-crowd-counting。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号