...
首页> 外文期刊>IEEE Transactions on Circuits and Systems for Video Technology >Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor
【24h】

Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor

机译:CNN的视频分类:将编解码器用作时空活动传感器

获取原文
获取原文并翻译 | 示例

摘要

We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard data sets shows that the proposed approach is competitive with the best two-stream video classification approaches found in the literature. At the same time: 1) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; 2) selective decoding is up to 12 times faster than full-frame decoding; and 3) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.
机译:我们通过两流卷积神经网络(CNN)设计调查视频分类,该设计直接吸收从压缩视频比特流中提取的信息。我们的方法从观察到所有现代视频编解码器将输入帧划分为宏块(MB)开始。我们证明了对压缩视频比特流内的MB运动矢量(MV)信息的选择性访问还可以提供选择性的,运动自适应的MB像素解码(也称为MB纹理解码)。与常规的全帧解码和随后的光流估计相比,这反过来允许以极高的速度导出时空视频活动区域。为了评估基于此类活动数据的视频分类框架的准确性,我们在MB纹理和MV对应关系上独立训练了两个CNN架构,然后融合它们的分数以得出每个测试视频的最终分类。对两个标准数据集的评估表明,所提出的方法与文献中发现的最佳两流视频分类方法相比具有竞争力。同时:1)基于CPU的MV​​提取实现比基于GPU的光流方法快977倍以上; 2)选择性解码比全帧解码快12倍; 3)我们提出的时空CNN的推理能力比文献中最快的方法低5到49倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号