首页> 外文期刊>Image and Vision Computing >Dynamic information enhancement for video classification
【24h】

Dynamic information enhancement for video classification

机译:视频分类动态信息增强

获取原文
获取原文并翻译 | 示例
           

摘要

How to extract and integrate spatiotemporal information for video classification is a major challenge. Advanced approaches adopt 2D, and 3D convolution kernels, or their variants as a basis of a spatiotemporal modeling process. However, 2D convolution kernels perform poorly along the temporal dimension, while 3D convolution kernels tend to create confusion between the spatial and temporal sources of information, with an increased risk of explosion of the number of model parameters. In this paper, we develop a more explicit way to improve the spatiotemporal modeling capacity of a 2D convolution network, which integrates two components: (1) Using Motion Intensification Block (MIB) to mandate a specific subset of channels to explicitly encode temporal clues to complement the spatial patterns extracted by other channels, achieving controlled diversity in the convolution calculations. (2) Using Spatial-temporal Squeeze-and-excitation (ST-SE) block to intensify the fused features reflecting the importance of different channels. In this manner, we improve the spatiotemporal dynamic information within the 2D backbone network, without performing complex temporal convolutions. To verify the effectiveness of the proposed approach, we conduct extensive experiments on challenging benchmarks. Our model achieves a competitive result on Something-Something V1, Something-Something V2, and a state-of-the-art performance on the Diving48 dataset, providing supporting evidence for the merits of the proposed methodology of spatiotemporal information encoding and fusion for video classification. (c) 2021 Published by Elsevier B.V.
机译:如何提取和整合视频分类的时空信息是一项重大挑战。高级方法采用2D和3D卷积内核,或其变体作为时空建模过程的基础。然而,2D卷积内核沿时间尺寸表现不佳,而3D卷积内核倾向于在空间和时间源之间产生混淆,增加了模型参数数量的爆炸的风险。在本文中,我们开发了一种更明确的方法来提高2D卷积网络的时空建模能力,该容量集成了两个组件:(1)使用运动强化块(MIB)来强制特定的通道子集,以明确地编码时间线索补充其他通道提取的空间模式,在卷积计算中实现受控多样性。 (2)使用空间挤压和激励(ST-SE)块加强融合特征,反映了不同通道的重要性。以这种方式,我们改善了2D骨干网络内的时空动态信息,而不执行复杂的时间卷积。为了验证拟议方法的有效性,我们对具有挑战性的基准进行了广泛的实验。我们的模型在Diving48数据集上实现了某种东西的竞争结果 - 某种东西 - 某种东西,一些东西,以及在Diving48数据集上的最先进的性能,为拟议的时空信息编码和视频融合的融合提供了支持证据分类。 (c)2021由elsevier b.v发布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号