...
首页> 外文期刊>Journal of Imaging >PedNet: A Spatio-Temporal Deep Convolutional Neural Network for Pedestrian Segmentation
【24h】

PedNet: A Spatio-Temporal Deep Convolutional Neural Network for Pedestrian Segmentation

机译:PedNet:行人分割的时空深度卷积神经网络

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Articulation modeling, feature extraction, and classification are the important components of pedestrian segmentation. Usually, these components are modeled independently from each other and then combined in a sequential way. However, this approach is prone to poor segmentation if any individual component is weakly designed. To cope with this problem, we proposed a spatio-temporal convolutional neural network named PedNet which exploits temporal information for spatial segmentation. The backbone of the PedNet consists of an encoder–decoder network for downsampling and upsampling the feature maps, respectively. The input to the network is a set of three frames and the output is a binary mask of the segmented regions in the middle frame. Irrespective of classical deep models where the convolution layers are followed by a fully connected layer for classification, PedNet is a Fully Convolutional Network (FCN). It is trained end-to-end and the segmentation is achieved without the need of any pre- or post-processing. The main characteristic of PedNet is its unique design where it performs segmentation on a frame-by-frame basis but it uses the temporal information from the previous and the future frame for segmenting the pedestrian in the current frame. Moreover, to combine the low-level features with the high-level semantic information learned by the deeper layers, we used long-skip connections from the encoder to decoder network and concatenate the output of low-level layers with the higher level layers. This approach helps to get segmentation map with sharp boundaries. To show the potential benefits of temporal information, we also visualized different layers of the network. The visualization showed that the network learned different information from the consecutive frames and then combined the information optimally to segment the middle frame. We evaluated our approach on eight challenging datasets where humans are involved in different activities with severe articulation (football, road crossing, surveillance). The most common CamVid dataset which is used for calculating the performance of the segmentation algorithm is evaluated against seven state-of-the-art methods. The performance is shown on precision/recall, F 1 , F 2 , and mIoU. The qualitative and quantitative results show that PedNet achieves promising results against state-of-the-art methods with substantial improvement in terms of all the performance metrics.
机译:清晰度建模,特征提取和分类是行人分割的重要组成部分。通常,这些组件彼此独立地建模,然后以顺序方式组合。但是,如果任何单个组件的设计都较弱,则此方法很容易出现细分问题。为了解决这个问题,我们提出了一个时空卷积神经网络,称为PedNet,它利用时间信息进行空间分割。 PedNet的骨干网络由一个编码器/解码器网络组成,分别用于特征图的下采样和上采样。网络的输入是三个帧的集合,输出是中间帧中分段区域的二进制掩码。不管经典的深层模型在其中的卷积层之后是用于分类的完全连接层,PedNet都是完全卷积网络(FCN)。它经过端到端的培训,无需任何预处理或后处理即可实现分割。 PedNet的主要特点是其独特的设计,它可以逐帧进行分段,但是它使用来自前一帧和未来帧的时间信息来对当前帧中的行人进行分段。此外,为了将低层特征与更深层学习到的高层语义信息结合起来,我们使用了从编码器到解码器网络的长跳转连接,并将低层的输出与高层结合在一起。这种方法有助于获得具有清晰边界的分割图。为了显示时间信息的潜在好处,我们还可视化了网络的不同层。可视化结果表明,网络从连续的帧中学习了不同的信息,然后将这些信息进行最佳组合以分割中间帧。我们在八个具有挑战性的数据集上评估了我们的方法,这些数据集中人类参与了多种活动(剧烈活动,足球,人行横道,监视)。针对七个最先进的方法评估了用于计算分割算法性能的最常见的CamVid数据集。性能显示在精度/召回率F 1,F 2和mIoU上。定性和定量结果表明,PedNet与最先进的方法相比,在所有性能指标方面均取得了显着改善,取得了可喜的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号