Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor

Chadha Aaron; Abbas Alhabib; Andreopoulos Yiannis

首页> 外文期刊>IEEE Transactions on Circuits and Systems for Video Technology >Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor

【24h】

Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor

机译：CNN的视频分类：将编解码器用作时空活动传感器

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We investigate video classification via a two-stream convolutional neural network (CNN) design that directly ingests information extracted from compressed video bitstreams. Our approach begins with the observation that all modern video codecs divide the input frames into macroblocks (MBs). We demonstrate that selective access to MB motion vector (MV) information within compressed video bitstreams can also provide for selective, motion-adaptive, MB pixel decoding (a.k.a., MB texture decoding). This in turn allows for the derivation of spatio-temporal video activity regions at extremely high speed in comparison to conventional full-frame decoding followed by optical flow estimation. In order to evaluate the accuracy of a video classification framework based on such activity data, we independently train two CNN architectures on MB texture and MV correspondences and then fuse their scores to derive the final classification of each test video. Evaluation on two standard data sets shows that the proposed approach is competitive with the best two-stream video classification approaches found in the literature. At the same time: 1) a CPU-based realization of our MV extraction is over 977 times faster than GPU-based optical flow methods; 2) selective decoding is up to 12 times faster than full-frame decoding; and 3) our proposed spatial and temporal CNNs perform inference at 5 to 49 times lower cloud computing cost than the fastest methods from the literature.

机译：我们通过两流卷积神经网络（CNN）设计调查视频分类，该设计直接吸收从压缩视频比特流中提取的信息。我们的方法从观察到所有现代视频编解码器将输入帧划分为宏块（MB）开始。我们证明了对压缩视频比特流内的MB运动矢量（MV）信息的选择性访问还可以提供选择性的，运动自适应的MB像素解码（也称为MB纹理解码）。与常规的全帧解码和随后的光流估计相比，这反过来允许以极高的速度导出时空视频活动区域。为了评估基于此类活动数据的视频分类框架的准确性，我们在MB纹理和MV对应关系上独立训练了两个CNN架构，然后融合它们的分数以得出每个测试视频的最终分类。对两个标准数据集的评估表明，所提出的方法与文献中发现的最佳两流视频分类方法相比具有竞争力。同时：1）基于CPU的MV提取实现比基于GPU的光流方法快977倍以上； 2）选择性解码比全帧解码快12倍； 3）我们提出的时空CNN的推理能力比文献中最快的方法低5到49倍。

著录项

来源
《IEEE Transactions on Circuits and Systems for Video Technology 》 |2019年第2期| 475-485| 共11页
作者
Chadha Aaron; Abbas Alhabib; Andreopoulos Yiannis;
展开▼
作者单位

UCL, Elect & Elect Engn Dept, London WC1E 7JE, England;

UCL, Elect & Elect Engn Dept, London WC1E 7JE, England;

UCL, Elect & Elect Engn Dept, London WC1E 7JE, England|Dithen, London NW11 8NA, England;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Video coding; classification; deep learning;

机译：视频编码;分类;深度学习;

相似文献

外文文献
中文文献
专利

1. Multi-view video codec using compressive sensing for wireless video sensor networks [J] . Angayarkanni Veeraputhiran, Radha S., Akshaya V International Journal of Mobile Communications . 2019 ,第6期

机译：无线视频传感器网络中使用压缩感测的多视图视频编解码器
2. A comparative analysis of video codecs for multihop wireless video sensor networks [J] . Noreen Imran, Boon-Chong Seet, Alvis C. M. Fong Multimedia Systems . 2012 ,第5期

机译：多跳无线视频传感器网络的视频编解码器比较分析
3. Behavioral features fusion for ethological CNN classification of open field test videos [J] . Xiao Zhaolin, Liu Huan, Zhou Guoqing, Multimedia Tools and Applications . 2021 ,第11期

机译：行为特征融合，用于开放现场测试视频的道德CNN分类
4. T3D-Y Codec: A Video Compression Framework using Temporal 3-D CNN Encoder and Y-Style CNN Decoder [C] . Abhishek Kumar Sinha, Deepak Mishra International Conference on Computing, Communication and Networking Technologies . 2020

机译：T3D-Y编解码器：使用时间3-D CNN编码器和Y型CNN解码器的视频压缩框架
5. Video analytics with spatio-temporal characteristics of activities. [D] . Cheng, Guangchun. 2015

机译：具有活动时空特征的视频分析。
6. Real-time Smartphone Activity Classification Using Inertial Sensors—Recognition of Scrolling Typing and Watching Videos While Sitting or Walking [O] . Sijie Zhuo, Lucas Sherlock, Gillian Dobbie, 2020

机译：使用惯性传感器的实时智能手机活动分类-识别坐着或走路时的滚动打字和观看视频
7. Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor [O] . Chadha, Aaron, Abbas, Alhabib, Andreopoulos, Yiannis 2017

机译：CNN视频分类：使用编解码器作为时空活动传感器
8. ACTIVE: Activity Concept Transitions in Video Event Classification (Open Access). [R] . Sun, C., Nevatia, R. 2014

机译：aCTIVE：视频事件分类中的活动概念转换（开放访问）。

Video Classification With CNNs: Using the Codec as a Spatio-Temporal Activity Sensor

摘要

著录项

相似文献

相关主题

期刊订阅