首页> 外文会议>IEEE Conference on Telecommunications, Optics and Computer Science >Violence Detection Based on Three-Dimensional Convolutional Neural Network with Inception-ResNet
【24h】

Violence Detection Based on Three-Dimensional Convolutional Neural Network with Inception-ResNet

机译:基于三维卷积神经网络与Inception-Reset的暴力检测

获取原文

摘要

Violence detection based on deep learning is a research hotspot in intelligent video surveillance. The pre-trained Three-Dimensional convolutional network (C3D) has a weak ability to extract spatiotemporal features of video. It can only achieve an accuracy of 88.2% on the UCF-101 data set, which cannot meet the accuracy requirements for detecting violent behavior in videos. Thus, this paper proposes a network architecture based on the C3D and fusion of the Inception-Resnet-v2 network residual Inception module. Through adaptive learning of feature weights, the three-dimensional features of violent behavior videos can be fully explored and the ability to express features is enhanced. Secondly, in view of the small amount of data in the data set for violence detection (HockeyFights), which easily leads to the problems of overfitting and low generalization ability, the UCF101 data set is used for fine-tune, so that the shallow layer of the network can fully extract the spatiotemporal features; Finally, the use of quantization tools to quantify network parameters and adjusting the sliding window parameters during inference can effectively improves the inference efficiency and improves the real-time performance while ensuring high accuracy. Through experiments, the accuracy of the network designed in the paper on the UCF-101 dataset is improved by 6.1% compared to the C3D network, and by 3.1% compared with the R3D network, indicating that the improved network structure can extract more spatiotemporal features, and finally achieved an accuracy of 95.1% on the HockeyFights test set.
机译:基于深度学习暴力检测是智能视频监控的一个研究热点。在预先训练的三维卷积网络(C3D)的能力弱,提取视频的时空特征。它只能达到88.2%的UCF-101数据集,这不能满足用于在视频检测暴力行为的精度要求的精度。因此,本文提出基于所述C3D和启-RESNET-V2网络残余启模块的融合的网络架构。通过要素权重的自适应学习,对暴力行为的影片的三维特征可以得到充分探讨,并表达特征的能力得到增强。其次,考虑到数据的暴力检测(HockeyFights),这容易导致过度拟合和低泛化能力的问题,UCF101数据集用于微调数据集中的少量的,因此,浅层网络的能充分提取的时空特征;最后,使用量化工具,量化的网络参数和推理可以有效地提高了推理的效率,提高了实时性能,同时确保高精确度调整过程中滑动窗口的参数。通过实验,在上UCF-101数据集中的纸设计网络的精度相比,C3D网络6.1%的提高,和由3.1%与R3D网络相比,表明改进的网络结构可以提取更多的时空特征,最后在HockeyFights测试组达到了95.1%的准确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号