首页> 外文期刊>Circuits, systems and signal processing >Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders
【24h】

Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders

机译:利用时空自动化器对视频异常检测进行深度多视图表示学习

获取原文
获取原文并翻译 | 示例
           

摘要

Visual perception is a transformative technology that can recognize patterns from environments through visual inputs. Automatic surveillance of human activities has gained significant importance in both public and private spaces. It is often difficult to understand the complex dynamics of events in real-time scenarios due to camera movements, cluttered backgrounds, and occlusion. Existing anomaly detection systems are not efficient because of high intra-class variations and inter-class similarities existing among activities. Hence, there is a demand to explore different kinds of information extracted from surveillance videos to improve overall performance. This can be achieved by learning features from multiple forms (views) of the given raw input data. We propose two novel methods based on the multi-view representation learning framework. The first approach is a hybrid multi-view representation learning that combines deep features extracted from 3D spatiotemporal autoencoder (3D-STAE) and robust handcrafted features based on spatiotemporal autocorrelation of gradients. The second approach is a deep multi-view representation learning that combines deep features extracted from two-stream STAEs to detect anomalies. Results on three standard benchmark datasets, namely Avenue, Live Videos, and BEHAVE, show that the proposed multi-view representations modeled with one-class SVM perform significantly better than most of the recent state-of-the-art methods.
机译:视觉感知是一种变换性技术,可以通过视觉输入识别来自环境的模式。人类活动的自动监测在公共场所和私人空间中取得了重要意义。由于相机运动,杂乱的背景和闭塞,通常难以了解实时场景中的事件的复杂动态。由于在活动中存在的高阶内变化和阶级相似性,现有的异常检测系统是不高效的。因此,需要探索从监视视频中提取的不同类型的信息,以提高整体性能。这可以通过从给定原始输入数据的多种形式(视图)的学习特征来实现。我们提出了一种基于多视图表示学习框架的两种新方法。第一种方法是混合多视图表示学习,它结合了从3D时空AutoEncoder(3D-STAE)提取的深度特征和基于Spatiotemporal自相关的梯度的强大手工制作功能。第二种方法是一种深的多视图表示学习,其结合了从两流状态提取的深度特征来检测异常。结果三个标准基准数据集,即大道,现场视频和行为,表明,用一流的SVM建模的建议的多视图表示明显优于最近最近的最新方法。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号