Video Foreground Extraction Using Multi-View Receptive Field and Encoder–Decoder DCNN for Traffic and Surveillance Applications

Akilan Thangarajah; Wu Q. M. Jonathan; Zhang Wandong

首页> 外文期刊>IEEE Transactions on Vehicular Technology >Video Foreground Extraction Using Multi-View Receptive Field and Encoder–Decoder DCNN for Traffic and Surveillance Applications

【24h】

Video Foreground Extraction Using Multi-View Receptive Field and Encoder–Decoder DCNN for Traffic and Surveillance Applications

机译：使用多视图接收场和编解码器DCNN进行视频前景提取，以用于交通和监视应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The automatic detection of foreground (FG) objects in videos is a demanding area of computer vision, with essential applications in video-based traffic analysis and surveillance. New solutions have attempted exploiting deep neural network (DNN) for this purpose. In DNN, learning agents, i.e., features for video FG object segmentation is nontrivial, unlike image segmentation. It is a temporally processed decision-making problem, where the agents involved are the spatial and temporal correlations of the FG objects and the background (BG) of the scene. To handle this and to overcome the conventional DL models poor delineation at the borders of FG regions due to fixed-view receptive filed-based learning, this work introduces a Multi-view Receptive Field Encoder-Decoder Convolutional Neural Network called MvRF-CNN. The main contribution of the model is harnessing multiple views of convolutional (conv) kernels with residual feature fusions at early, mid and late stages in an encoder-decoder (EnDec) architecture. It enhances the ability of the model to learn condition-invariant agents resulting in highly delineated FG masks when compared to the existing approaches from heuristic- to DL-based techniques. The model is trained with sequence-specific labeled samples to predict scene-specific pixel-level labels of FG objects in near static scenes with a minute dynamism. The experimental study on 37 video sequences from traffic and surveillance scenarios that include complex environments, viz. dynamic background, camera jittery, intermittent object motion, scenes with cast shadows, night videos, and lousy weather proves the effectiveness of the model. The study covers two input configurations: a 3-channel (RGB) single frame and a 3-channel double-frame with a BG such that two consecutive grayscale frames stacked with a prior BG model. The ablation investigations are also conducted to show the importance of transfer learning (TL) and mid-fusion approaches for enhancing the segmentation performance and the models robustness on failure modes: when there is lack of manually annotated hard ground truths (HGT) and testing the model under non-scene-specific videos. In overall, the model achieves a figure-of-merit of ${95%}$ and 42 $FPS$ of mean average performance.

机译：视频中前景（FG）对象的自动检测是计算机视觉的一个重要领域，在基于视频的流量分析和监视中具有必不可少的应用。为此，新的解决方案已尝试利用深度神经网络（DNN）。在DNN中，与图像分割不同，学习代理（即视频FG对象分割的功能）是不平凡的。这是一个经过时间处理的决策问题，其中所涉及的主体是FG对象与场景背景（BG）的时空相关性。为了解决这个问题并克服传统的DL模型由于基于固定视图接受字段的学习而导致的FG区域边界处的轮廓不佳，这项工作引入了一种称为MvRF-CNN的多视图接受场编码器/解码器卷积神经网络。该模型的主要贡献是在编码器/解码器（EnDec）架构的早期，中期和后期利用卷积（conv）内核的多个视图与残差特征融合。与现有的基于启发式技术到基于DL的方法相比，它增强了模型学习条件不变代理的能力，从而导致了高度划定的FG掩码。该模型使用序列特定的标记样本进行训练，以预测接近静态场景中FG对象具有特定动态的场景特定像素级标记。来自交通和监视场景的37个视频序列的实验研究，包括复杂环境。动态背景，相机抖动，物体间歇性运动，带有阴影的场景，夜视和恶劣的天气证明了该模型的有效性。这项研究涵盖了两种输入配置：一个具有BG的3通道（RGB）单帧和一个3通道双帧，从而使两个连续的灰度帧与先前的BG模型堆叠在一起。还进行了消融研究，以显示转移学习（TL）和中间融合方法对于增强分割性能和模型在故障模式下的鲁棒性的重要性：当缺少手动注释的硬性基础事实（HGT）并进行测试时，在非特定场景的视频下进行建模。总体而言，该模型的平均绩效为$ {95 ％} $和42 FPS $。

著录项

来源
《IEEE Transactions on Vehicular Technology》 |2019年第10期|9478-9493|共16页
作者
Akilan Thangarajah; Wu Q. M. Jonathan; Zhang Wandong;
展开▼
作者单位

Lakehead Univ Dept Comp Sci Thunder Bay ON P7B 5E1 Canada;

Univ Windsor Dept Elect & Comp Engn Windsor ON N9B 3P4 Canada;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Surveillance; Image segmentation; Cameras; Object segmentation; Convolutional neural networks; Background subtraction; encoder-decoder network; foreground extraction; transfer learning;

机译：监视图像分割相机;对象分割;卷积神经网络背景扣除;编码器-解码器网络;前景提取;转移学习;

相似文献

外文文献
中文文献
专利

1. A Multi-view Learning Approach to Foreground Detection for Traffic Surveillance Applications [J] . Kunfeng Wang, Yuqiang Liu, Chao Gou, IEEE Transactions on Vehicular Technology . 2016,第6期

机译：交通监控应用前景检测的多视图学习方法
2. A Novel Nonparametric Approach for Neural Encoding and Decoding Models of Multimodal Receptive Fields [J] . Rahul Agarwal, Zhe Chen, Fabian Kloosterman, Neural computation . 2016,第7期

机译：新型多参数接收域神经编码和解码模型的非参数方法
3. Foreground Extraction and Motion Recognition Technology for Intelligent Video Surveillance [J] . Yan Xuebo, Fan Yuemin International Journal of Pattern Recognition and Artificial Intelligence . 2020,第10期

机译：智能视频监控的前景提取与运动识别技术
4. Foreground Extraction of Surveillance Video Under Complex Background [C] . Yue Yu, Hong Pan, Xinde Li, IEEE Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems . 2018

机译：复杂背景下监控视频的前景提取
5. Traffic Characterization and Modeling of H.264 Scalable & Multi-View Encoded Video. [D] . Pulipaka, Venkata Sai Akshay. 2012

机译：H.264可扩展和多视图编码视频的流量表征和建模。
6. Spatio-Temporal Attention Model for Foreground Detection in Cross-Scene Surveillance Videos [O] . Dong Liang, Jiaxing Pan, Han Sun, 2019

机译：跨场景监控视频中前景检测的时空注意模型
7. 18.3 A 59.5mW Scalable/Multi-View Video Decoder Chip for Quad/3D Full HDTV and Video Streaming Applications [O] . Tzu-der Chuang, Pei-kuei Tsung, Pin-chih Lin, 2013

机译：18.3一个59.5mW可扩展/多视图视频解码器芯片，用于四/ 3D全高清电视和视频流应用

Video Foreground Extraction Using Multi-View Receptive Field and Encoder–Decoder DCNN for Traffic and Surveillance Applications

摘要

著录项

相似文献

相关主题

期刊订阅